Turning an XML document into a corresponding hierarchy of Java bean objects is a fairly common task. In a previous article, I described how to accomplish this using the standard SAX and DOM APIs.
Although powerful and flexible, both APIs are, in effect, too low-level for the specific task at hand. Furthermore, the unmarshalling procedure itself requires a fair amount of coding: a parse-stack must be maintained when using SAX, and the DOM-tree must be navigated when using DOM.
This is where the Apache Jakarta Commons Digester framework comes in.
The Jakarta Digester framework grew out of the Jakarta Struts Web toolkit. Originally developed to process the central struts-config.xml configuration file, it was soon recognized that the framework was more generally useful, and moved to the Jakarta Commons project, the stated goal of which is to provide a "repository of reusable Java components." The most recent version, Digester 1.3, was released on August 13, 2002.
The Digester class lets the application programmer
specify a set of actions to be performed whenever the parser
encounters certain simple patterns in the XML document. The Digester
framework comes with 10 prepackaged "rules," which cover most
of the required tasks when unmarshalling XML (such as creating a
bean or setting a bean property), but each user is free to define
and implement his or her own rules, as necessary.
In this example, we will unmarshall the same XML document that we used in the previous article:
<?xml version="1.0"?>
<catalog library="somewhere">
<book>
<author>Author 1</author>
<title>Title 1</title>
</book>
<book>
<author>Author 2</author>
<title>His One Book</title>
</book>
<magazine>
<name>Mag Title 1</name>
<article page="5">
<headline>Some Headline</headline>
</article>
<article page="9">
<headline>Another Headline</headline>
</article>
</magazine>
<book>
<author>Author 2</author>
<title>His Other Book</title>
</book>
<magazine>
<name>Mag Title 2</name>
<article page="17">
<headline>Second Headline</headline>
</article>
</magazine>
</catalog>
The bean classes are also the same, except for one important change:
In the previous article, I had declared these classes to have
package scope -- primarily so that I could define
all of them in the same source file! Using the Digester framework,
this is no longer possible; the classes need to be declared as
public (as is required for classes conforming to
the JavaBeans specification):
import java.util.Vector;
public class Catalog {
private Vector books;
private Vector magazines;
public Catalog() {
books = new Vector();
magazines = new Vector();
}
public void addBook( Book rhs ) {
books.addElement( rhs );
}
public void addMagazine( Magazine rhs ) {
magazines.addElement( rhs );
}
public String toString() {
String newline = System.getProperty( "line.separator" );
StringBuffer buf = new StringBuffer();
buf.append( "--- Books ---" ).append( newline );
for( int i=0; i<books.size(); i++ ){
buf.append( books.elementAt(i) ).append( newline );
}
buf.append( "--- Magazines ---" ).append( newline );
for( int i=0; i<magazines.size(); i++ ){
buf.append( magazines.elementAt(i) ).append( newline );
}
return buf.toString();
}
}
public class Book {
private String author;
private String title;
public Book() {}
public void setAuthor( String rhs ) { author = rhs; }
public void setTitle( String rhs ) { title = rhs; }
public String toString() {
return "Book: Author='" + author + "' Title='" + title + "'";
}
}
import java.util.Vector;
public class Magazine {
private String name;
private Vector articles;
public Magazine() {
articles = new Vector();
}
public void setName( String rhs ) { name = rhs; }
public void addArticle( Article a ) {
articles.addElement( a );
}
public String toString() {
StringBuffer buf = new StringBuffer( "Magazine: Name='" + name + "' ");
for( int i=0; i<articles.size(); i++ ){
buf.append( articles.elementAt(i).toString() );
}
return buf.toString();
}
}
public class Article {
private String headline;
private String page;
public Article() {}
public void setHeadline( String rhs ) { headline = rhs; }
public void setPage( String rhs ) { page = rhs; }
public String toString() {
return "Article: Headline='" + headline + "' on page='" + page + "' ";
}
}
|
The Digester class processes the input XML document
based on patterns and rules. The patterns must match XML
elements, based on their name and location in the
document tree. The syntax used to describe the matching patterns
resembles the XPath match patterns, a little:
the pattern catalog matches the top-level
<catalog> element, the pattern
catalog/book matches a <book>
element nested directly inside a <catalog>
element, but nowhere else in the document, etc.
All patterns are
absolute: the entire path from the root element on down has to
be specified. The only exception are patterns containing the
wildcard operator *: the pattern
*/name will match a <name>
element anywhere in the document. Also note that there is no
need for a special designation for the root element, since all
paths are absolute.
|
Related Reading
Java & XML Data Binding |
Whenever the Digester encounters one of the specified patterns, it
performs the actions that have been associated with it. In this, the
Digester framework is of course related to a SAX parser (and in
fact, the Digester class implements
org.xml.sax.ContentHandler and maintains
the parse stack). All rules to be used with the Digester must extend
org.apache.commons.digester.Rule -- which in itself
exposes methods similar to the SAX ContentHandler
callbacks: begin() and end() are called
when the opening and closing tags of the matched element are
encountered.
The body() method is called for the content
nested inside of the matched element, and finally, there is a
finish() method, which is called once processing
of the closing tag is complete, to provide a hook to do possible
final clean-up chores.
Most application developers will not have to concern themselves
with these functions, however, since the standard rules that
ship with the framework are likely to provide all desired
functionality.
To unmarshal a document, then, create an instance of the
org.apache.commons.digester.Digester class,
configure it if necessary, specify the required patterns and
rules, and finally, pass a reference to the XML file to the
parse() method. This is demonstrated in the
DigesterDriver class below. (The filename of the
input XML document must be specified on the command line.)
import org.apache.commons.digester.*;
import java.io.*;
import java.util.*;
public class DigesterDriver {
public static void main( String[] args ) {
try {
Digester digester = new Digester();
digester.setValidating( false );
digester.addObjectCreate( "catalog", Catalog.class );
digester.addObjectCreate( "catalog/book", Book.class );
digester.addBeanPropertySetter( "catalog/book/author", "author" );
digester.addBeanPropertySetter( "catalog/book/title", "title" );
digester.addSetNext( "catalog/book", "addBook" );
digester.addObjectCreate( "catalog/magazine", Magazine.class );
digester.addBeanPropertySetter( "catalog/magazine/name", "name" );
digester.addObjectCreate( "catalog/magazine/article", Article.class );
digester.addSetProperties( "catalog/magazine/article", "page", "page" );
digester.addBeanPropertySetter( "catalog/magazine/article/headline" );
digester.addSetNext( "catalog/magazine/article", "addArticle" );
digester.addSetNext( "catalog/magazine", "addMagazine" );
File input = new File( args[0] );
Catalog c = (Catalog)digester.parse( input );
System.out.println( c.toString() );
} catch( Exception exc ) {
exc.printStackTrace();
}
}
}
After instantiating the Digester, we specify that
it should not validate the XML document against a DTD -- because
we did not define one for our simple Catalog document.
Then we specify the patterns and the associated rules: the
ObjectCreateRule creates an instance of the specified
class and pushes it onto the parse stack. The
SetPropertiesRule sets
a bean property to the value of an XML attribute of the current
element -- the first argument to the rule is the name of the
attribute, the second, the name of the property.
Whereas
SetPropertiesRule takes the value from an
attribute, BeanPropertySetterRule takes the value
from the raw character data nested inside of the current
element. It is not necessary to specify the name of the property
to set when using BeanPropertySetterRule: it
defaults to the name of the current XML element. In the example
above, this default is being used in the rule definition matching
the catalog/magazine/article/headline pattern.
Finally, the SetNextRule pops the object on top of the
parse stack and passes it to the named method on the object below
it -- it is commonly used to insert a finished bean into its parent.
Note that it is possible to register several rules for the same
pattern. If this occurs, the rules are executed in the order in
which they are added to the Digester -- for instance, to deal with
the <article> element, found at
catalog/magazine/article, we first create the
appropriate article bean, then set the
page property, and finally pop the completed
article bean and insert it into its
magazine parent.
It is not only possible to set bean properties, but
to invoke arbitrary methods on objects in the stack. This is
accomplished using the CallMethodRule to specify
the method name and, optionally, the number and type of arguments
passed to it. Subsequent specifications of the CallParamRule
define the parameter values to be passed to the invoked functions.
The values can be taken either from named attributes of the current
XML element, or from the raw character data contained by the current
element. For instance, rather than using the
BeanPropertySetterRule in the DigesterDriver
implementation above, we could have achieved the same effect by
calling the property setter explicitly, and passing the data as
parameter:
digester.addCallMethod( "catalog/book/author", "setAuthor", 1 );
digester.addCallParam( "catalog/book/author", 0 );
The first line gives the name of the method to call
(setAuthor()), and the expected number of
parameters (1). The second line says to take
the value of the function parameter from the character data
contained in the <author> element and
pass it as first element in the array of arguments (i.e., the
array element with index 0). Had we also specified an
attribute name (e.g., digester.addCallParam(
"catalog/book/author", 0, "author" );), the value would
have been taken from the respective attribute of the current
element instead.
One important caveat: confusingly,
digester.addCallMethod( "pattern", "methodName", 0 );
does not specify a call to a method taking no arguments --
instead, it specifies a call to a method taking one argument,
the value of which is taken from the character data of the
current XML element! We therefore have yet another
way to implement a replacement for BeanPropertySetterRule:
digester.addCallMethod( "catalog/book/author", "setAuthor", 0 );
To call a method that truly takes no parameters, use
digester.addCallMethod( "pattern", "methodName" );.
|
Below are brief descriptions of all of the standard rules.
ObjectCreateRule: Creates an object of the
specified class using its default constructor and pushes it
onto the stack; it is popped when the element completes. The
class to instantiate can be given through a class
object or the fully-qualified class name.
FactoryCreateRule: Creates an object using
a specified factory class and pushes it onto the stack.
This can be useful for classes that do not provide a
default constructor. The factory class must implement the
org.apache.commons.digester.ObjectCreationFactory
interface.
SetPropertiesRule: Sets one or several named
properties in the top-level bean using the values of named
XML element attributes. Attribute names and property names
are passed to this rule in String[] arrays.
(Typically used to handle XML constructs like
<article page="10">.)
BeanPropertySetterRule: Sets a named property
on the top-level bean to the character data enclosed by
the current XML element.
(Example: <page>10</page>.)
SetPropertyRule: Sets a property on the
top-level bean. Both the property name, as well as the
value to which this property will be set, are given as
attributes to the current XML element.
(Example: <article key="page" value="10" />.)
SetNextRule: Pops the object on top of the stack
and passes it to a named method on the object immediately below.
Typically used to insert a completed bean into its parent.
SetTopRule: Passes the second-to-top object
on the stack to the top-level object. This is useful if the
child object exposes a setParent method, rather than the
other way around.
SetRootRule: Calls a method on the object at
the bottom of the stack, passing the object on top of the
stack as argument.
CallMethodRule: Calls an arbitrary named method
on the top-level bean. The method may take an arbitrary set of
parameters. The values of the parameters are given by subsequent
applications of the CallParamRule.
CallParamRule: Represents the value of a
method parameter. The value of the parameter is either taken
from a named XML element attribute, or from the raw character
data enclosed by the current element. This rule requires
that its position on the parameter list is specified by an
integer index.
|
Related Reading
Programming Jakarta Struts |
So far, we have specified the patterns and rules programmatically at compile time. While conceptually simple and straightforward, this feels a bit odd: the entire framework is about recognizing and handling structure and data at run time, but here we go fixing the behavior at compile time! Large numbers of fixed strings in source code typically indicate that something is being configured (rather than programmed), which could be (and probably should be) done at run time instead.
The org.apache.commons.digester.xmlrules package
addresses this issue. It provides the DigesterLoader
class, which reads the pattern/rule-pairs from an XML document
and returns a digester already configured accordingly. The XML
document configuring the Digester must comply with
the digester-rules.dtd, which is part of the
xmlrules package.
Below is the contents of the configuration file (named rules.xml) for the example application. I want to point out several things here.
Patterns can be specified in two
different ways: either as attributes to each XML element
representing a rule, or using the <pattern>
element. The pattern defined by the latter is valid for all
contained rule elements. Both ways can be mixed, and
<pattern> elements can be nested -- in either
case, the pattern defined by the child element is appended
to the pattern defined in the enclosing <pattern>
element.
The <alias> element is used with the
<set-properties-rule> to map an
XML attribute to a bean property.
Finally, using the current release of the Digester package, it
is not possible to specify the BeanPropertySetterRule
in the configuration file. Instead, we are using the
CallMethodRule to achieve the same effect,
as explained above.
<?xml version="1.0"?>
<digester-rules>
<object-create-rule pattern="catalog" classname="Catalog" />
<set-properties-rule pattern="catalog" >
<alias attr-name="library" prop-name="library" />
</set-properties-rule>
<pattern value="catalog/book">
<object-create-rule classname="Book" />
<call-method-rule pattern="author" methodname="setAuthor"
paramcount="0" />
<call-method-rule pattern="title" methodname="setTitle"
paramcount="0" />
<set-next-rule methodname="addBook" />
</pattern>
<pattern value="catalog/magazine">
<object-create-rule classname="Magazine" />
<call-method-rule pattern="name" methodname="setName" paramcount="0" />
<pattern value="article">
<object-create-rule classname="Article" />
<set-properties-rule>
<alias attr-name="page" prop-name="page" />
</set-properties-rule>
<call-method-rule pattern="headline" methodname="setHeadline"
paramcount="0" />
<set-next-rule methodname="addArticle" />
</pattern>
<set-next-rule methodname="addMagazine" />
</pattern>
</digester-rules>
Since all the actual work has now been delegated to the
Digester and DigesterLoader classes,
the driver class itself becomes trivially simple. To run
it, specify the catalog document as the first command line
argument, and the rules.xml file as the second.
(Confusingly, the DigesterLoader will not read the
rules.xml file from a File or an
org.xml.sax.InputSource,
but requires a URL -- the File reference in the code
below is therefore transformed into an equivalent URL.)
import org.apache.commons.digester.*;
import org.apache.commons.digester.xmlrules.*;
import java.io.*;
import java.util.*;
public class XmlRulesDriver {
public static void main( String[] args ) {
try {
File input = new File( args[0] );
File rules = new File( args[1] );
Digester digester = DigesterLoader.createDigester( rules.toURL() );
Catalog catalog = (Catalog)digester.parse( input );
System.out.println( catalog.toString() );
} catch( Exception exc ) {
exc.printStackTrace();
}
}
}
This concludes our brief overview of the Jakarta Commons Digester package. Of course, there is more. One topic ignored in this introduction are XML namespaces: Digester allows you to specify rules that only act on elements defined within a certain namespace.
We mentioned briefly the possibility of developing custom rules,
by extending the Rule class. The Digester
class exposes the customary
push(), peek(), and pop()
methods, giving the individual developer freedom to manipulate
the parse stack directly.
Lastly, note that there is an additional package providing a Digester implementation which deals with RSS (Rich-Site-Summary)-formatted newsfeeds. The Javadoc tells the full story.
Philipp K. Janert is a software project consultant, server programmer, and architect.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.