Posted on October 24, 2007 in Software Development by Rob Di Marco1 Comment »

Last night I presented an Introduction to JRuby at the Philadelphia Java Users Group.  My goal was to introduce Java programmers to Ruby and get them to how features of the language allowed for functionality and code structure that is not possible in Java alone.  While I did talk about Rails at the end, my point was to get the developers to understand that the meta-programming functionality in Ruby allows for both plugin develeropers and users of Rails to extend and improve the base platform in many interesting ways.

Probably my favorite example was an expansion on a previous blog post about using JRuby and HtmlUnit.  In the presentation, I created a simple Domain Specific Language on top of the great HtmlUnit library to allow for simple parsing of pages with content in JavaScript.  This was a great example of how JRuby can be used to take advantage of both Java and Ruby

The presentation is available at http://www.innovationontherun.com/presentation-files/Introduction%20to%20JRuby-10-23-2007.pdf and the associated source code can be found at http://www.innovationontherun.com/presentation-files/JRuby-JUG.zip.  If you are looking at the source code, start with the file ABOUT.txt.  It describes the prerequisites needed to get the code to run and the order of the listed files follows the order that they were used in the presentation.

If you have questions or comments, feel free to send me an email.

Popularity: 60% [?]

Posted on September 19, 2007 in Uncategorized by Rob Di Marco14 Comments »

Much has been made over Paul Graham’s famous posting about how Lisp gave his startup Viaweb an advantage over the competition.  Graham’s thesis is that there are features in the Lisp language that could be leveraged to make his programming team more productive and better able to respond to customer needs.

The idea that a programming language will make your team more productive is the Holy Grail of software development.  Many languages have been promoted as a cure all for productivity (C++ in the 80s, Java in the 90s, Ruby now, Lisp perpetually), and each turns out to have their benefits and weaknesses.  But there is no doubt that certain features in languages can lead to leaps in improvements in productivity.  For example, having built in garbage collection is a sine qua non for modern development languages.  No one can argue (at least no one can argue well) that garbage collection does not improve developer productivity for 99.9% of development efforts.

What is Mixin?

Since many readers may not be aware of what a mixin is, I will try to describe it briefly.  Ruby is an object-oriented language.  With all object-oriented languages, the designers had to decide whether or not to support multiple inheritance.  Perl and C++ designers allow for multiple inheritance, Java does not allow for multiple inheritance but does allow for polymorphism through the use of interfaces.  Ruby’s designers opted to not allow multiple inheritance, but to allow for code from one module to be able to be included in another class.

So consider the following code:

module BarModule
  def hello_world
    puts "Hello World"
  end
end

class BaseClass
  def class_method
    puts "In class method"
  end
end

class Foo < BaseClass
  include BarModule
end

f = Foo.new
f.class_method
f.hello_world

 In this example, we are creating a module named BarModule with a hello_world method, a class named BaseClass, and then another class Foo that extends BaseClass and includes BarModule. The class Foo will then have both the methods from BarModule and BaseClass, but it will only BE an instance of BaseClass as that is its only parent. Somewhat different than what a Java developer would do, but it makes sense. Running this file will result in the following output:

$ ruby foo.rb
In class method
Hello World
$

So that is a basic example of what a mixin is.

Adding send() into the equation

Another interesting and powerful thing about Ruby is that all method calls are actually message passing calls.  So for example, we could rewrite:

f = Foo.new
f.class_method
f.hello_world

to

f = Foo.send(:new)
f.send(:class_method)
f.send(:hello_world)

and the results would be exactly the same! Again a little weird if new to Ruby, but it this language feature can lead to some very interesting and powerful results.

Combining include and send

Now let’s combine these two methods.  What if I wanted to take my BarModule and apply it to a class that is not sign, say the ruby base class String.

String.send(:include, BarModule)
s = "Arbitrary String"
s.hello_world

Running the above code would produce:

$ ruby include-bar-module-on-string.rb
Hello World
$

That’s right, at runtime, I was able to add an arbitrary method onto the base String class. That method is now available to any Strings that I instantiate within my application.

The Implication of this Feature on Rails

Because of this mixin feature, a developer can add arbitrary methods and modify behavior of core classes at runtime.  This is amazingly powerful if you are trying to write plugins and extensions to the framework.  Because you can add functionality to existing objects, users can install your plugin and start taking advantage of new functionality without having to make changes to the objects that they are instantiating in their application.  In frameworks written in other languages, such as Java, plugging in new functionality means that you need to change how your objects are instantiated.  This will require code changes and/or potentially configuration changes (if you are using a dependency injection framework like Spring).  Hard to develop, hard to maintain, and a pain for plugin developers to support.  But because of the mixin feature, Rails plugin developers can customize the base objects and the users of the plugins do not have to change any of their code or configuration logic.  A good example of this can be seen in the Row Version Rails plugin.  This particular plugin puts a created_at, UPDATED_AT and a row_version on every row inserted into the database.  It requires ZERO code change to make this happen.  You just install the plugin and go.  It works by adding hooks into the ActiveRecord::Base (the base persistence class in Rails) so that when records are saved, the correct information is put in to those fields.  A very easy and powerful plugin to install and use.

Conclusions

The point to take away from this is not that Ruby on Rails rocks and Java sucks.  Far from it.  But choosing a framework with lots of extensions that can take care of many of the mundane tasks allows your developers to spend more of their time focused on the problems of the user and not on common problems.  The mixin feature of Ruby allows for the development of easy to use but powerful plugins that will be hard for any non-Ruby based framework to compete with.

Popularity: 81% [?]

Posted on September 5, 2007 in Uncategorized by Rob Di Marco8 Comments »

Scraping static web sites to verify functionality or to access data has been around as long as there has been a web (example of scraping of a static web page with Ruby).  But with the advent of AJAX and other techniques that use JavaScript to dynamically insert HTML into a web page, scraping has gotten more challenging.  Most scraping technology does fine when downloading a single HTML page, but cannot easily handle the dynamic content.

With the 1.12 release of HtmlUnit, this headless web browser can now support parsing and executing JavaScript.  This allows a scraper to access this dynamic content as simply as the scraper accesses static content, and without having to fire up a heavy execution engine like Gecko.

JRuby is a great technology for easily construct a script that calls into the HtmlUnit functionality without having to deal with all the syntactic sugar that Java requires.

Step 0: About The Example Code

The tar ball, scraper.tgz, contains:

  • scraper.rb - the JRuby script we will be executing.  All code discussed in this example comes from there
  • lib/*.jar - all of the JAR files needed to run the example
  • run.sh - a simple shell script that points JRuby at the lib directory and silences some warning messages

Step 1: Enabling JRuby to Use the Java JAR files

# Require Java so we can use the Java libraries
require 'java';

# Get HTML Unit and all of its required libraries
require 'htmlunit-1.13.jar';
require 'commons-httpclient-3.1.jar';
require 'commons-io-1.3.1.jar';
require 'commons-logging-1.1.jar';
require 'commons-lang-2.3.jar'
require 'commons-codec-1.3.jar'
require 'xercesImpl-2.6.2.jar'
require 'xmlParserAPIs-2.6.2.jar'
require 'jaxen-1.1.1.jar'
require 'commons-collections-3.2.jar'
require 'js-1.6R7'
require 'nekohtml-0.9.5.jar'

# Include the Web Client class
include_class 'com.gargoylesoftware.htmlunit.WebClient';

In this block, we first use the are telling JRuby to use the JAR files required by HtmlUnit.  Some notes:

  • You have to specify every JAR file that HtmlUnit depends upon, even if you are not calling the method directly
  • All JAR files must be in the LOAD_PATH for JRuby.  This is done by -I<DIR_NAME> arguments passed in to JRuby from the command line.
  • The include_class is similar to an import statement in Java and puts the WebClient object in scope.

At this point, we can now instantiate and use the WebClient class

Step 2: Parsing a Basic HTML Page

Before we get into parsing a dynamic page, let’s take a look at how to parse a simple page.  In this example, I am going to parse out information from the Maven 2 Archive for HtmlUnit found at http://repo1.maven.org/maven2/htmlunit/htmlunit.

# Function for getting a list of all directories
def get_htmlunit_maven_pages
  wc = WebClient.new;

  page = wc.getPage("http://repo1.maven.org/maven2/htmlunit/htmlunit");

  # List the directories...
  page.getByXPath('//img[@alt="[DIR]"]').each do |img|
    a = img.getNextSibling.getNextSibling
    puts 'DIR: ' + a.getHrefAttribute
  end

  # List the files...
  page.getByXPath('//img[@alt="[TXT]"]').each do |img|
    a = img.getNextSibling.getNextSibling
    puts 'FILE: ' + a.getHrefAttribute
  end
end

The first step in the method is instantiating a new instance by calling WebClient.new and then download the page using wc.getPage.

When requesting a page with a content type of text/html, the getPage call will return an instance of HtmlPage, and we can now use XPath expressions and DOM calls to get the URLs for the directories and for files.  Very simple to get at the appropriate data.  HtmlUnit has a bunch of other methods that you can use to navigate the page, check out the source documentation for the HtmlPage object.

Step 3: Parsing Data Written By JavaScript Functions

The code behind parsing a HTML page that uses JavaScript to dynamically create content is actually no harder than the previous example.  HtmlUnit will detect the script tags in the page you are downloading and execute the appropriate script in line.  For an example, I will use my blog home page and its inclusion of a JavaScript widget from MyBlogLog.  This script makes a call to MyBlogLog and finds out who the most recent registered users to visit my site have been.  In our example, we will parse out these users name and URLs.

# Function for seeing who the most recent my blog log users were

def search_iotr
  wc = WebClient.new;

  page = wc.getPage("http://www.innovationontherun.com");
  my_blog_log_info = page.getHtmlElementById("MBL_COMM")
  my_blog_log_info.getByXPath('//td[@class="mbl_mem"]').each do |td|
    td.getByXPath('//a').each do |a|
      puts a.asText + ":" + a.getHrefAttribute
    end
  end
end

If you look at the source for this page, you will see a script tag that downloads a JavaScript file from MyBlogLog.com.  The downloaded JavaScript will make calls to document.write that will insert an HTML table into the page.  The id of the table is MBL_COMM, so our first step is to find that HTML element.  Once we have the element, it is a couple of simple XPath expressions to find the anchor tag that contains the recent visitors name and URL.  All of the implementation of downloading the data and putting into the HTML page is hidden from us by HtmlUnit so we can easily use DOM to get at the information we are interested in.

Other Situations Where HtmlUnit Rocks!

Anytime JavaScript is being used to either enable navigation or modify the HTML document, HtmlUnit can be a great asset in your parsing.  This includes:

  • Content from AJAX requests
  • Situations where JavaScript events are being used to impact behavior.  An example would be a page using an onChange handler on a select list to modify form values and/or submit the form.  HtmlUnit is very handy for simplifying this interaction.

A word of caution, the JavaScript implementation is not fully featured in HtmlUnit, so some sites still may not work.  However, the HtmlUnit team is validating the browser against a fair number of popular libraries, so hopefully in future HtmlUnit releases, this will be less of an issue.

Appendix

Prerequisite Information To Run the Example

Make sure that you have Java installed.  I am using Java 1.6, but HtmlUnit and JRuby should support older versions.

Download JRuby from http://dist.codehaus.org/jruby/ and put the jruby executable (found in the bin directory of the downloaded file) in your path.

To verify that Java and jruby are set up correctly, just run jruby from the command line and ask for the version:

  > jruby -version
ruby 1.8.5 (2007-08-23 rev 4201) [x86-jruby1.0.1]

My Environment Details

  • JRuby version 1.0.1
  • HtmlUnit 1.13
  • Java version 1.6.0_02-b05
  • Ubuntu 7.0.4

Reference

Popularity: 69% [?]

Posted on July 24, 2007 in Uncategorized by Rob Di Marco3 Comments »

Since I have been talking about Java vs. Ruby, I figured I would give a recent example of where Ruby really solved my problems simply and easily.  My company uses SalesForce.com for their CRM solution.  Our sales, finance, and fulfillment teams have logins that are maintained and managed through the website.  This is a pain for both the users and our IT team as people now have to remember their SalesForce username and password as well as their internal user name and password.

To combat this problem, SalesForce has created a method that allows authentication of users via a SOAP request.  Very cool, solves the problem, so I signed up to implement it.  SalesForce supplies a WSDL documenting the services that need to be supported and you supply the URL that implements the defined services.  Should be simple to connect to our ActiveDirectory server using LDAP to perform the validation of users.

The Initial Attempt: Using Java Web Services…

Our standard development environment is Java 5, Maven 2 for builds, Apache 2 for HTTP(S), and JBoss 4.0.5 as an application server.  I thought it would be a simple exercise to use JAX-WS combined with JAX-WS Maven Plugin for auto-generation of my stubs and the JBoss WS to simply deploy.  First problem was getting the Maven plugin working correctly.  After some trial and error, I got my POM setup correctly and was deploying the EAR.  Unfortunately, I realized that JBoss was assuming that Tomcat was running on port 8080, causing a problem in the WSDL that was being referenced.  So I had to find the JBoss property to tweak to fix that.  Next problem wound up being that the default org.jboss.ws.soap.SOAPMessageImpl did not implement the setProperty method causing an UnsupportedOperationException.  Awesome.  To try to fix that issue, I wound up trying the following (in different combinations):

  • Upgrading JBoss to 4.2.1.
  • Upgrading JBoss WS to 2.0 with JBoss 4.2.1
  • Trying Java 6 as JAX-WS is in the standard JDK with both JBoss 4.0.5 and JBoss 4.2.1

With all of these combinations, I kept running into class loading issues and little help on the web.  After much frustration, I said to hell with this, how can I do it in ruby….

SOAP4R to the rescue

SOAP4R is the best know SOAP library for ruby and the gem is easily installed

> gem install soap4r –source http://dev.ctor.org/download/

Now I could auto-generate my stubs from the WSDL and put in my business logic.

Generating Stubs from the WSDL

In general, the documentation for soap4r is non-helpful, but by looking at the source code and googling around a bit, I was able to get the gist.  From the supplied SalesForce WSDL, I was able to quickly generate my stubs:

require ‘rubygems’
gem ’soap4r’
require ‘wsdl/soap/wsdl2ruby’
DIR = File.dirname(”.”)
gen = WSDL::SOAP::WSDL2Ruby.new
gen.basedir=File.dirname(DIR)
gen.location=File.join(DIR,”AuthenticationService.wsdl”)
gen.logger.level=Logger::DEBUG
gen.opt['classdef'] = “SforceAuth”
gen.opt['client_skelton'] = nil
gen.opt['servant_skelton'] = nil
gen.opt['cgi_stub'] = nil
gen.opt['standalone_server_stub'] = nil
gen.opt['mapping_registry'] = nil
gen.opt['driver'] = nil
gen.opt['force'] = true
gen.run

 Running this produces seven files:

File Name Purpose
SforceAuthServant.rb Stub class that is used for the implementation.  The server side implementation code will need to go in here.
SforceAuthenticationService.rb If you would like to run a standalone server, this will be the file that you run.
SforceAuthenticationService.cgi If you would like to use a web server (e.g. Apache) and CGI, this will be the file that you use.
SforceAuthenticationServiceClient.rb If you would like to have a client to test your code, this will be the file that you run.
SforceAuth.rb Class definitions for the request object and the response object.
SforceAuthMappingRegistry.rb Class used to map SOAP requests and responses to ruby objects.
SforceAuthDriver.rb Driver class used by the client to call into the server.

 

Customizing the Generated Ruby Files to Get A Working System

Of the seven files created, I needed three of them (the client, the standalone server, and the CGI script) needed to be executable.  To get them to run, I needed to set the executable flag:

> chmod +x SforceAuthenticationService.rb SforceAuthenticationService.cgi SforceAuthenticationServiceClient.rb

In addition, because I am using Ruby Gems to manage my dependencies, I needed to add the following two lines to the top of each of these three files:

require ‘rubygems’
gem ’soap4r

To test the implementation, I started up the standalone authentication server (which listens on port 10080 by default) and called it from the client.

> SforceAuthenticationService.rb &

> SforceAuthenticationServiceClient.rb http://localhost:10080

If all is working, you will get a NoMethodError exception being thrown.  To remedy this problem edit the SforceAuthServant.rb file and on line 15, change the line from raise NotImplementedError.new to {:authenticated => true}.  Restarting the standalone server and rerunning the client will result in success.

Now, I just needed to implement the code to call the LDAP server.  First step was obtaining and installing the Gem for the Ruby net-ldap project.  Validating the username and password was as simple as putting this code into authenticate method for the SforceAuthServant class:

def authenticate(parameters)

ldap = Net::LDAP.new
ldap.host = “ldap.hmsonline.com”
ldap.auth parameters.username[0], parameters.password[0]
if ldap.bind_as(
    :base => “dc=com”,
    :filter => “(sAMAccountName=”+parameters.username[0]+”)”,
    :password => “”
)
  return {:authenticated=>true}
else
  return {:authenticated=>false}
end

end 

Again, restarting the standalone server and using the client (with some tweaks to send the right parameters over) validated that the service was working successfully.

Configuring Apache to use the CGI

Obviously, I want SalesForce to use HTTPS when sending usernames and passwords.  We use
Apache as our web server for SSL.  To get the CGI set up, I made sure that the directory with the files was accessible from the Apache configuration and then added a .htaccess file

Options +ExecCGI
AddHandler cgi-script .cgi

Restarted Apache and I was done.  The whole exercise took about an hour, and that was with me figuring out what was going on.  The soap4r team has done a great job making setting up and and deploying a SOAP service quick, powerful, and easy to manage.

Popularity: 29% [?]