Parsing XML using IronRuby

Today I have been looking at how you can use IronRuby to communicate with the various REST API’s floating around the web. Most of the services, such as flickr or twitter, allow you to select the format of the response.  While JSON (JavaScript Object Notation) seems to be the most common format, I decided to start with Xml, simply because I know how to parse xml.

Initially, I wanted to use a Ruby library to parse the xml, however I found REXML which comes with Ruby is not yet supported by IronRuby. As such, I had to take a different approach. I decided to use the System.Xml namespace as my base, and then create a wrapper and monkey patch the CLR objects to produce a cleaner more flexible API.

When creating the wrapper, the first task is to define all of the required references.  Generally with IronRuby, if you want to do .Net interop you will need to reference mscorlib and System.  In this case, I’ve also referenced the System.Xml assembly. The include comment is similar to a using directive in C#, within Ruby include allows you to do ‘Mixins’ and allows the functionality to be accessible from the current module, as if they was combined – a very powerful technique. In this case, it allows us to access CLR objects without needing to specify the namespace.

require ‘mscorlib’
require ‘System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089’
include System
require ‘System.Xml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089’
include System::Xml

The first task is to create a new Ruby class. This class has an initialize method which will take in raw xml as a string.  Under the covers, it creates an instance of System.Xml.XmlDocument and loads the xml.

class Document
  def initialize(xml)
    @document = XmlDocument.new
    @document.load_xml xml
  end
end

Once we have created the document, we need a way to access the xml elements. Within C#, you would use the SelectNodes method, which returns a XmlNodeList, you would then iterate this collection to access the XmlNodes and as such your data. Well, life in IronRuby is a little different. I found that when iterating over the XmlNodeList, I was getting XmlElement objects, each of the nodes. I also wanted to provide a more ‘ruby-like’ way to access the elements.

The method I created has two arguments, one is the xpath query, the second being a Block, a piece of code which I want to be executed for each element. Within my code, I can iterate over all the elements, passing control back to the block with the element as a parameter for processing.

def elements(xpath, &b)
  ele = @document.select_nodes xpath
  ele.each do |e|
    b.call e
  end
end

Within the block, I can place the code required to process that section of the XML. However, I still need a way to access the data of the elements.  Because the above code will return XmlElement objects, I wanted to monkey patch the class to include a few custom methods. This is amazingly simple within IronRuby, you define a class with the same name and define your additional methods.

class XmlElement
  def get(name)
    select_single_node name
  end
  def to_s
    inner_text.to_s
  end
end

I also include an additional method called node() which is the same as above, but allows me to return sub-elements from an XmlElement object.

Finally, I saved this in a file called xml.rb. The filename is used by consumers within the require statement.

With this in place, I can use the wrapper to process xml.

# Include the wrapper
require ‘xml’

# Create the document
@document = Document.new(‘Jim193312BobSmith‘)

# Access root/name elements
@document.elements(‘root/name’) do |e|
   # Output the contents of the element named first
   puts e.get(‘first’)
   # Access the element named dob, then output the value of year.
   e.node(‘dob’) {|y| puts y.get(‘year’)}
end

When I execute this block of code, Jim, 1933, Bob is printed to the console.

>ir processXml.rb
Jim 
1933
Bob

While the wrapper isn’t very advanced, it’s a very quick and easy way to start working with xml from IronRuby.

A big thank you to Ivan Porto Carrero who pointed me in the correct direction of how to accept blocks within methods, before I had to do this:

@document.elements(‘x’).collect {|e| puts e.get(‘first’)}

Not much difference, but enough to make a impact.

Feel free to download the wrapper and the sample.  For future reference, I’ve uploaded the code to the MSDN Code Gallery, which I will update if I release a new version.

Technorati Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *