Posted on September 19, 2007 in Uncategorized by Rob Di Marco22 Comments »

Much has been made over Paul Graham’s famous posting about how Lisp gave his startup Viaweb an advantage over the competition.  Graham’s thesis is that there are features in the Lisp language that could be leveraged to make his programming team more productive and better able to respond to customer needs.

The idea that a programming language will make your team more productive is the Holy Grail of software development.  Many languages have been promoted as a cure all for productivity (C++ in the 80s, Java in the 90s, Ruby now, Lisp perpetually), and each turns out to have their benefits and weaknesses.  But there is no doubt that certain features in languages can lead to leaps in improvements in productivity.  For example, having built in garbage collection is a sine qua non for modern development languages.  No one can argue (at least no one can argue well) that garbage collection does not improve developer productivity for 99.9% of development efforts.

What is Mixin?

Since many readers may not be aware of what a mixin is, I will try to describe it briefly.  Ruby is an object-oriented language.  With all object-oriented languages, the designers had to decide whether or not to support multiple inheritance.  Perl and C++ designers allow for multiple inheritance, Java does not allow for multiple inheritance but does allow for polymorphism through the use of interfaces.  Ruby’s designers opted to not allow multiple inheritance, but to allow for code from one module to be able to be included in another class.

So consider the following code:

module BarModule
  def hello_world
    puts "Hello World"
  end
end

class BaseClass
  def class_method
    puts "In class method"
  end
end

class Foo < BaseClass
  include BarModule
end

f = Foo.new
f.class_method
f.hello_world

 In this example, we are creating a module named BarModule with a hello_world method, a class named BaseClass, and then another class Foo that extends BaseClass and includes BarModule. The class Foo will then have both the methods from BarModule and BaseClass, but it will only BE an instance of BaseClass as that is its only parent. Somewhat different than what a Java developer would do, but it makes sense. Running this file will result in the following output:

$ ruby foo.rb
In class method
Hello World
$

So that is a basic example of what a mixin is.

Adding send() into the equation

Another interesting and powerful thing about Ruby is that all method calls are actually message passing calls.  So for example, we could rewrite:

f = Foo.new
f.class_method
f.hello_world

to

f = Foo.send(:new)
f.send(:class_method)
f.send(:hello_world)

and the results would be exactly the same! Again a little weird if new to Ruby, but it this language feature can lead to some very interesting and powerful results.

Combining include and send

Now let’s combine these two methods.  What if I wanted to take my BarModule and apply it to a class that is not sign, say the ruby base class String.

String.send(:include, BarModule)
s = "Arbitrary String"
s.hello_world

Running the above code would produce:

$ ruby include-bar-module-on-string.rb
Hello World
$

That’s right, at runtime, I was able to add an arbitrary method onto the base String class. That method is now available to any Strings that I instantiate within my application.

The Implication of this Feature on Rails

Because of this mixin feature, a developer can add arbitrary methods and modify behavior of core classes at runtime.  This is amazingly powerful if you are trying to write plugins and extensions to the framework.  Because you can add functionality to existing objects, users can install your plugin and start taking advantage of new functionality without having to make changes to the objects that they are instantiating in their application.  In frameworks written in other languages, such as Java, plugging in new functionality means that you need to change how your objects are instantiated.  This will require code changes and/or potentially configuration changes (if you are using a dependency injection framework like Spring).  Hard to develop, hard to maintain, and a pain for plugin developers to support.  But because of the mixin feature, Rails plugin developers can customize the base objects and the users of the plugins do not have to change any of their code or configuration logic.  A good example of this can be seen in the Row Version Rails plugin.  This particular plugin puts a created_at, UPDATED_AT and a row_version on every row inserted into the database.  It requires ZERO code change to make this happen.  You just install the plugin and go.  It works by adding hooks into the ActiveRecord::Base (the base persistence class in Rails) so that when records are saved, the correct information is put in to those fields.  A very easy and powerful plugin to install and use.

Conclusions

The point to take away from this is not that Ruby on Rails rocks and Java sucks.  Far from it.  But choosing a framework with lots of extensions that can take care of many of the mundane tasks allows your developers to spend more of their time focused on the problems of the user and not on common problems.  The mixin feature of Ruby allows for the development of easy to use but powerful plugins that will be hard for any non-Ruby based framework to compete with.

Popularity: 83% [?]

Posted on September 18, 2007 in Software Development by Rob Di Marco2 Comments »

Gave a presentation tonight to the Philly JBoss Users Group about using Embedded JBoss.

For the unaware, Embedded JBoss allows a developer to embed much of the functionality of the JBoss application server (EJB, Messaging, JNDI, Security, Transactions) in another application.  A great use for the technology is in the creation of unit tests that can actually test Session, Entity, and Message Driven Beans.  At Health Market Science, we heavily used the Embedded EJB container (a predecessor to Embedded JBoss) connecting to a in-memory Hypersonic DB for improving out unit tests.  It made developing and debugging significantly easier and all of the developers loved doing it.  I highly encourage any developer that is developing using EJB3, and especially anyone using JBoss, to use the combination of Embedded JBoss.

Full Presentation can be found at http://www.innovationontherun.com/presentation-files/Intro%20to%20Embedded%20JBoss.pdf

Popularity: 47% [?]

Posted on September 16, 2007 in Software Development by Rob Di MarcoNo Comments »

This Tuesday, I am presenting at the Philadelphia JBoss Users Group on using Embedded JBoss, a very cool way that allows you to do things like test EJBs using JUnit without having to install an application server.  The slides are ready but I wanted to make sure my demos are running.  So I go to run the tests and my first error is:

Class not found: [Ljava.lang.String;

What the hell?  Of course I initially assume it is something I did wrong, then that it is something in the JBoss configuration.  After much anguish and googling, it turns out there is an awesome bug/feature in Java 1.6 where the following code (which worked in Java 1.5) throws a ClassNotFoundException in Java 1.6.

  public class test {

      public static void main(String[] args) throws Exception {
          String[] s = new String[] { “123″ };
          String clName = s.getClass().getName();
          test.class.getClassLoader().loadClass(clName);  // throws exception in JDK 1.6!
      }
  }

But the lameness did not stop there.  The next issue that popped up was a NullPointerException when trying to read a file.  Turns out there is a bug on Windows XP with directory names involving spaces.  I was running it off of my desktop, c:\Documents and Settings\rdimarco\Desktop.  Moving the application to the c:\jboss directory solved my problems.  I’m not sure if it is a JDK or JBoss bug, but at this point I’m just frustrated.

Popularity: 31% [?]

Posted on September 5, 2007 in Uncategorized by Rob Di Marco9 Comments »

Scraping static web sites to verify functionality or to access data has been around as long as there has been a web (example of scraping of a static web page with Ruby).  But with the advent of AJAX and other techniques that use JavaScript to dynamically insert HTML into a web page, scraping has gotten more challenging.  Most scraping technology does fine when downloading a single HTML page, but cannot easily handle the dynamic content.

With the 1.12 release of HtmlUnit, this headless web browser can now support parsing and executing JavaScript.  This allows a scraper to access this dynamic content as simply as the scraper accesses static content, and without having to fire up a heavy execution engine like Gecko.

JRuby is a great technology for easily construct a script that calls into the HtmlUnit functionality without having to deal with all the syntactic sugar that Java requires.

Step 0: About The Example Code

The tar ball, scraper.tgz, contains:

  • scraper.rb – the JRuby script we will be executing.  All code discussed in this example comes from there
  • lib/*.jar – all of the JAR files needed to run the example
  • run.sh – a simple shell script that points JRuby at the lib directory and silences some warning messages

Step 1: Enabling JRuby to Use the Java JAR files

# Require Java so we can use the Java libraries
require 'java';

# Get HTML Unit and all of its required libraries
require 'htmlunit-1.13.jar';
require 'commons-httpclient-3.1.jar';
require 'commons-io-1.3.1.jar';
require 'commons-logging-1.1.jar';
require 'commons-lang-2.3.jar'
require 'commons-codec-1.3.jar'
require 'xercesImpl-2.6.2.jar'
require 'xmlParserAPIs-2.6.2.jar'
require 'jaxen-1.1.1.jar'
require 'commons-collections-3.2.jar'
require 'js-1.6R7'
require 'nekohtml-0.9.5.jar'

# Include the Web Client class
include_class 'com.gargoylesoftware.htmlunit.WebClient';

In this block, we first use the are telling JRuby to use the JAR files required by HtmlUnit.  Some notes:

  • You have to specify every JAR file that HtmlUnit depends upon, even if you are not calling the method directly
  • All JAR files must be in the LOAD_PATH for JRuby.  This is done by -I<DIR_NAME> arguments passed in to JRuby from the command line.
  • The include_class is similar to an import statement in Java and puts the WebClient object in scope.

At this point, we can now instantiate and use the WebClient class

Step 2: Parsing a Basic HTML Page

Before we get into parsing a dynamic page, let’s take a look at how to parse a simple page.  In this example, I am going to parse out information from the Maven 2 Archive for HtmlUnit found at http://repo1.maven.org/maven2/htmlunit/htmlunit.

# Function for getting a list of all directories
def get_htmlunit_maven_pages
  wc = WebClient.new;

  page = wc.getPage("http://repo1.maven.org/maven2/htmlunit/htmlunit");

  # List the directories...
  page.getByXPath('//img[@alt="[DIR]"]').each do |img|
    a = img.getNextSibling.getNextSibling
    puts 'DIR: ' + a.getHrefAttribute
  end

  # List the files...
  page.getByXPath('//img[@alt="[TXT]"]').each do |img|
    a = img.getNextSibling.getNextSibling
    puts 'FILE: ' + a.getHrefAttribute
  end
end

The first step in the method is instantiating a new instance by calling WebClient.new and then download the page using wc.getPage.

When requesting a page with a content type of text/html, the getPage call will return an instance of HtmlPage, and we can now use XPath expressions and DOM calls to get the URLs for the directories and for files.  Very simple to get at the appropriate data.  HtmlUnit has a bunch of other methods that you can use to navigate the page, check out the source documentation for the HtmlPage object.

Step 3: Parsing Data Written By JavaScript Functions

The code behind parsing a HTML page that uses JavaScript to dynamically create content is actually no harder than the previous example.  HtmlUnit will detect the script tags in the page you are downloading and execute the appropriate script in line.  For an example, I will use my blog home page and its inclusion of a JavaScript widget from MyBlogLog.  This script makes a call to MyBlogLog and finds out who the most recent registered users to visit my site have been.  In our example, we will parse out these users name and URLs.

# Function for seeing who the most recent my blog log users were

def search_iotr
  wc = WebClient.new;

  page = wc.getPage("http://www.innovationontherun.com");
  my_blog_log_info = page.getHtmlElementById("MBL_COMM")
  my_blog_log_info.getByXPath('//td[@class="mbl_mem"]').each do |td|
    td.getByXPath('//a').each do |a|
      puts a.asText + ":" + a.getHrefAttribute
    end
  end
end

If you look at the source for this page, you will see a script tag that downloads a JavaScript file from MyBlogLog.com.  The downloaded JavaScript will make calls to document.write that will insert an HTML table into the page.  The id of the table is MBL_COMM, so our first step is to find that HTML element.  Once we have the element, it is a couple of simple XPath expressions to find the anchor tag that contains the recent visitors name and URL.  All of the implementation of downloading the data and putting into the HTML page is hidden from us by HtmlUnit so we can easily use DOM to get at the information we are interested in.

Other Situations Where HtmlUnit Rocks!

Anytime JavaScript is being used to either enable navigation or modify the HTML document, HtmlUnit can be a great asset in your parsing.  This includes:

  • Content from AJAX requests
  • Situations where JavaScript events are being used to impact behavior.  An example would be a page using an onChange handler on a select list to modify form values and/or submit the form.  HtmlUnit is very handy for simplifying this interaction.

A word of caution, the JavaScript implementation is not fully featured in HtmlUnit, so some sites still may not work.  However, the HtmlUnit team is validating the browser against a fair number of popular libraries, so hopefully in future HtmlUnit releases, this will be less of an issue.

Appendix

Prerequisite Information To Run the Example

Make sure that you have Java installed.  I am using Java 1.6, but HtmlUnit and JRuby should support older versions.

Download JRuby from http://dist.codehaus.org/jruby/ and put the jruby executable (found in the bin directory of the downloaded file) in your path.

To verify that Java and jruby are set up correctly, just run jruby from the command line and ask for the version:

  > jruby -version
ruby 1.8.5 (2007-08-23 rev 4201) [x86-jruby1.0.1]

My Environment Details

  • JRuby version 1.0.1
  • HtmlUnit 1.13
  • Java version 1.6.0_02-b05
  • Ubuntu 7.0.4

Reference

Popularity: 55% [?]

Posted on September 5, 2007 in Uncategorized by Rob Di Marco8 Comments »

In the last few months, Adobe, Sun, and Microsoft have all had major announcements around their platforms for developing rich Internet applications (RIA).  Adobe, the current market leader in the RIA space with their combination of Flash and Flex, has recently been promoting the Air product that will allow Flex applications to hook into the desktop.  Microsoft has announced their Silverlight technology with a focus on integration with scripting and .NET languages and a strong focus on high quality video and audio experience.  Right after the Silverlight announcement, in what seemed like a knee jerk press release, Sun announced the Java FX technology to ostensibly compete in the space.  There are two motivators that are really driving interest in these technologies: better media (e.g. audio and video) experiences and better online/offline experiences.  Adobe and Microsoft get this and these drivers have been at the forefront of their offerings.  Sun hasn’t gotten the message.

Flash has been around for a long time and is installed and active on almost every browser, it seems most of the major video sites (e.g. YouTube) are currently using Flash as their RIA platform and the Flash 9 player has some new video/audio codec supports as well as support for ActionScript 3 that meets the ECMAScript specification (basically JavaScript).  Flash/Flex are mature, have solid tools, great documentation, a vibrant community and a huge installation base.  Flash/Flex has first mover advantage in the space.

Silverlight has just released their 1.0 and has gotten a ton of press.  From the development side, the combination of JavaScript and XML to create applications feels a lot like building applications with Flex.  Silverlight has also been designed to hook into .NET languages including IronRuby and IronPython and is really focusing on high quality audio and video experience.  For some cool examples of the technology check out a visual search engine, tafiti.com or the live streaming television broadcasting application LiveStation.  There is cross-browser and cross-platform support for the toolkit.  Not a huge community or install base just yet, but the technology is easy to install (about 4 MB download), easy to develop with, produces a fast and clean user experience, and the easy hook in to .NET languages plus its inevitable preloading into base Windows installs means that these communities will build up fast.

JavaFX is basically a nicer way to write applets.  There is no focus on offline applications, no focus on media, and a small effort on reducing download sizes.  Really nothing very interesting or new from what Java developers have had before.

It seems no one at Sun has ever read The 22 Immutable Laws of Marketing[1].  Let’s look at how many of the marketing laws Sun is violating.

  • Law of Leadership – Being the first to market.  Sun actually had an opportunity 10 years ago when applets were starting to be built.  But while they may have been first to market for embedded applications in a browser, they let Adobe Flash be the first into the easy to install browser plugin for media applications.
  • Law of Mind and Law of Perception – Java GUI applications are thought of as being big, slow, and ugly.  By including the term Java in the JavaFX name, people will clearly associate JavaFX applications with those terms as well.  Sun should rebrand.
  • Law of the Ladder/Law of Duality - It is very hard to change market position.  And if you are not one of the top two players in a market, you need to find a new market.  JavaFX will not be in the top two, they need a new market.
  • Law of Line Extension – Another problem with Java, the brand is so generic and means so many things that you don’t know what it is.  Java ME, Java SE, Java EE, JavaFX.  They all leverage the Java brand, but are all wildly different products.  (Sidenote, Sun’s changing their NASDAQ stock ticker symbol to JAVA continues these violations)
  • Law of Resources – Adobe is going around with their Adobe AIR tour.  Microsoft is partnering with a bunch of big media players (e.g. check out http://silverlight.net/Showcase/ for samples for WWE, CBS, MLB) to work with Silverlight.  Sun spent some money on a press release.

For JavaFX to be a success, for their to be a community, I would recommend the following:

  • Figure out what market segment you are solving problems for.  Do you want to be in the consumer RIA space or should the focus be on RIA applications developed within corporate intranets?  Where Adobe and Microsoft are focusing on the consumer space (explaining the drivers of media and online/offline play), perhaps Sun could focus on a different problem space with different needs.
  • Stop with every name starting with the Java brand!  Get some differentiations with product names so people know what they are talking about.
  • Come up with a real roadmap for the technology and let the development community know it is a core part of Sun’s development path, not a one off project.

[1] EVERYONE should own this book.  It’s $10, will take an hour to read, and will change your view of how you look at your company and business.

Popularity: 35% [?]

Next Page »