Exploring E4X with Ruby
Pages: 1, 2
XML Writing Made Easy
I really liked E4X's model of putting XML inline into the code. I couldn't push Ruby that far, because the language already defines the less-than and greater-than symbols. I settled on:
doc += xml <<XMLEND
<account id="bar">
<transaction amount="100" />
<transaction amount="200" />
</account>
XMLEND
This will add a chunk of XML to the document using the plus operator. The
new xml keyword is really just a function:
def xml( xmldata ) NodeWrapper.new( REXML::Document.new( xmldata ).root ); end
It creates a new REXML document, and then wraps its root node in a
NodeWrapper to make it easy to access. To make the plus happen, I
had to add some methods to NodeWrapper:
class NodeWrapper
def method_missing( name, *args )
name = name.to_s
if ( name =~ /^_/ )
name.gsub!( /^_/, "" )
if ( name =~ /=$/ )
name.gsub!( /=$/, "" )
_write_attribute( name, args[0] )
else
_read_attribute( name )
end
else
xpath( name )
end
end
def initialize( node )
@node = node
end
def to_s() @node.to_s; end
def to_i() @node.to_s.to_i; end
def _add ( nodes )
@node << nodes._get_node
self
end
alias :<< :_add
alias :+ :_add
def _get_node() @node; end
def xpath( name )
children = NodeListWrapper.new()
REXML::XPath.each( @node, name ) { |elem|
children.push( NodeWrapper.new( elem ) )
}
children
end
private
def _read_attribute( name )
@node.attributes[ name ].to_s
end
def _write_attribute( name, value )
@node.attributes[ name ] = value
end
end
I broke out method_missing to make it a little clearer about
what it does. I also aliased << and + to the
_add method. This method in turn uses REXML's
<< method on a node to add a set of nodes from one tree into
another.
Now, to test the upgraded NodeWrapper class, I will add a new
account into the tree after reading the file:
out = {}
doc = readxml( 'test_data.xml' )
doc += xml <<XMLEND
<account id="bar">
<transaction amount="100" />
<transaction amount="200" />
</account>
XMLEND
doc.account.each { |account|
amount = 0
account.transaction.each { |item| amount += item._amount.to_i }
out[ account._id ] = amount
}
p out
The xml method creates the new tree, and the plus operator
handles adding it into the document. I also added a new readxml
function, which takes a path name and returns a wrapper to the root node of the
XML object.
One last step is to integrate XPath to make things even easier.
Adding XPath Support
Our new NodeWrapper supports an XPath method that returns a node list of
wrappers:
def xpath( name )
children = NodeListWrapper.new()
REXML::XPath.each( @node, name ) { |elem|
children.push( NodeWrapper.new( elem ) )
}
children
end
XPath allows you to specify a set of nodes in an XML document in a way similar to specifying files in an operating system. As paths are to a file system, an XPath is a path within an XML document. Every node and attribute in any tree has a unique XPath.
XPath also supports wildcards and will return a set of nodes that match. For example, this code:
total = 0
doc.xpath( "account[@id='a']//@amount" ).each { |amount| total += amount.to_i }
print "#{total}\n"
returns the total for just the account with the id value of a.
This code:
total = 0
doc.xpath( "//@amount" ).each { |amount| total += amount.to_i }
print "#{total}\n"
returns the amount sum for the entire document, regardless of account.
This just barely scratches the surface of the power of XPath. It's important to have easy access to XPath features in any XML API.
Caveats
This article was an experiment in creating an E4X-style API by using the power of the Ruby language. It doesn't cover the entire standard, but it does provide some perspective both on the value of E4X and on the flexibility of scripting languages. Perl and Python both provide the equivalent of the missing method system shown here, so it's possible to do something similar to this in either of those languages.
With statically typed languages, such C++ or Java, you will run into the problem that the nodes and attributes are not defined at compile time. One alternative solution is to use a code generator to build classes from an XML schema definition that will provide dot-notation syntax for read and write access. Unfortunately, the code will be specific to one particular XML schema. For run-time flexibility, you will need to use the DOM or SAX method of reading and writing.
Finally, there is very little published information about E4X so far. This article relies on what I could glean from the hour-long presentation I attended. If I have made some mistakes in the E4X syntax, I apologize. I'm pretty sure I nailed the highlights, even if the specifics may vary somewhat.
Conclusion
I've written plenty of articles recently using Java as the language. In comparison, writing the code for this article was a blast. One of the great things about Ruby is that it makes writing code really fun because it works the way we think.
Writing code and working with computers should be fun. I hope this article gives some reasons to try and simplify XML access to make it fun for everyone. If that helps Ruby out a little bit in the process, so much the better.
Jack Herrington is an engineer, author and presenter who lives and works in the Bay Area. His mission is to expose his fellow engineers to new technologies. That covers a broad spectrum, from demonstrating programs that write other programs in the book Code Generation in Action. Providing techniques for building customer centered web sites in PHP Hacks. All the way writing a how-to on audio blogging called Podcasting Hacks.
Return to ONLamp.com
- Trackback from http://www.innoq.com/blog/st/2005/09/12/e4x_ruby.html
E4X + Ruby!
2005-09-12 11:37:23 [View]
-
Nice Idea
2004-09-12 08:03:26 joebagOdonuts [View]
-
Nice Idea
2004-09-12 09:13:40 Jack Herrington |
[View]
-
Nice Idea
2004-10-15 12:22:16 HotFusionMan [View]
-
Nice Idea
2004-10-15 12:41:03 Jack Herrington |
[View]
-
Nice Idea
2004-10-15 21:36:12 HotFusionMan [View]
-
Nice Idea
2004-10-15 21:50:43 Jack Herrington |
[View]
-
Very cool
2004-09-11 12:34:12 djberg96 [View]