Hierarchical Data Sets are not new. They already exist in the form of CICS transactional data, files in directories, and plain Java objects, as well as the obvious XML. In the XML Journal in early 2001, I floated the idea that programmers can benefit from hierarchical data abstractions even though many of their data sources are predominantly relational (such as databases including MySQL, Oracle, SQL Server, DB2, etc.).
The .NET world has a similar idea taking root in the notion of "datasets." Although there are important differences between my proposed Hierarchical Data Sets and the nature of Microsoft's datasets, it is evident that Hierarchical Data Sets enhance relational abstractions with richer detail.
This article examines the structure of, and a Java API for, Hierarchical Data Sets. Unlike the XML Journal reference two years ago, you will now actually have a piece of executable code to use to start taking advantage of Hierarchical Data Sets. Although programmers can code in Java to access various data sources and construct the final Hierarchical Data Set, this article has an implementation that you can readily use to construct these Hierarchical Data Sets declaratively by simply composing pre-built relational adapters. Relational adapters include file readers, SQL readers, Stored Procedure readers, et cetera.
|
Related Reading
Java & XML Data Binding |
The question you're probably asking is "What good are these Hierarchical Data Sets?" Although they can't rival the salutary effects of large expensive pieces of Carbon on your most certainly deserving companions, Hierarchical Data Sets are quite useful in the programming world. For starters, an entire HTML page worth of data can be satisfied by a single Hierarchical Data Set. In an MVC model, a controller servlet can deliver a Hierarchical Data Set to a JSP page, which will paint it without further ado. For a warmup, it can be converted to XML and directly returned to the caller by the controller servlet. For the appeal, the Hierarchical Data Set can be converted to Excel. For the stylish, the Hierarchical Data Set can be redirected to a reporting engine or a charting engine that supports XML data.
Although the primary focus of the article is the Java programming API for Java programmers, Hierarchical Data Sets can be used by non-Java programmers quite effectively to obtain XML, HTML, or Excel formats directly from relational databases and other data sources by using a J2EE server such as Tomcat. Without further ado, let us investigate the structure of Hierarchical Data Sets and see how these data sets can be obtained declaratively (while relaxing your programming muscles a bit).
A Hierarchical Data Structure can be conceptually represented as a Java API, or XML, or some other format. It is easiest to visualize as XML.
<AspireDataSet>
<!-- A set of key value pairs at the root level -->
<key1>val1</key1>
<key2>val2</key2>
<!-- A set of named loops -->
<loop name="loop">
</loop>
<loop name="loop2">
</loop>
</AspireDataSet>
This is a set of key/value pairs. A given set of key/value pairs could
yield n independent loops. Each loop is essentially a table of
data. The term "loop" is synonymous with "table." I
haven't used "table" because people might literally take
"table" to mean only data from a relational table. Having mentioned
that is a collection of rows (RowSet!), let us look closer at the
structure of a loop:
<loop name="loopname">
<row>
<!-- a set of key value pairs -->
<key1>val1</key1>
<key2>val2</key2>
<!-- a set of named loops -->
<loop name="loopname1">
</loop>
<!-- a set of named loops -->
<loop name="loopname2">
</loop>
</row>
<row>
</row>
</loop>
The only odd thing here is the structure of a row. A row is, expectedly, a
collection of key/value pairs. Here a row includes not only key/value pairs, but
also another recursive set of n number of independent loops. This
extension can produce trees with any amount of depth. (Or should I say,
height!)
The moment I showed the hierarchical data as XML, there is a possibility that people might take a Hierarchical Data Set to be literally XML and, hence, literally DOM and, hence, a lot of memory inside of the JVM. No need to panic. The Hierarchical Data Set can have its own Java API and need not be represented as a DOM. The majority of the time it is a forward-only-traversing-cursor-like-lazy-loading tree. Here is a working Java API for a Hierarchical Data Set:
package com.ai.htmlgen;
import com.ai.data.*;
/**
* Represents a Hierarchical Data Set.
* An hds is a collection of rows.
* You can step through the rows using ILoopForwardIterator
* You can find out about the columns via IMetaData.
* An hds is also a collection loops originated using the current row.
*/
public interface ihds extends ILoopForwardIterator
{
/**
* Returns the parent if available
* Returns null if there is no parent
*/
public ihds getParent() throws DataException;
/**
* For the current row return a set of
* child loop names. ILoopForwardIteraor determines
* what the current row is.
*
* @see ILoopForwardIterator
*/
public IIterator getChildNames() throws DataException;
/**
* Given a child name return the child Java object
* represented by ihds again
*/
public ihds getChild(String childName) throws DataException;
/**
* returns a column that is similar to SUM, AVG etc of a
* set of rows that are children to this row.
*/
public String getAggregateValue(String keyname) throws DataException;
/**
* Returns the column names of this loop or table.
* @see IMetaData
*/
public IMetaData getMetaData() throws DataException;
/**
* Releases any resources that may be held by this loop of data
* or table.
*/
public void close() throws DataException;
}
For brevity, the Java interface ihds represents "Interface to
Hierarchical Data Set." This API allows you to step through your loops
recursively. An implementation has the option to load the loops only when they
are requested. It can also assume either forward-only or random traversal.
Before going further, let me present the two additional interfaces that this API
uses: ILoopForwardIterator and IMetaData.
ILoopForwardIteratorpackage com.ai.htmlgen;
import com.ai.data.*;
public interface ILoopForwardIterator
{
/**
* getValue from the current row matching the key
*/
public String getValue(final String key);
public void moveToFirst() throws DataException;
public void moveToNext() throws DataException;
public boolean isAtTheEnd() throws DataException;
}
IMetaData: For Reading Column Namespackage com.ai.data;
public interface IMetaData
{
public IIterator getIterator();
public int getColumnCount();
public int getIndex(final String attributeName)
throws FieldNameNotFoundException;
}
Now that we know the structure of Hierarchical Data Set, how do you get hold of one? As I stated earlier, this is easy under Aspire. The steps are as follows:
ihds in your Java code.Each of these steps is explained in some detail below.
Aspire is a small JAR file that can complement your Java programming, particularly when used with an app server such as Tomcat. At the heart of Aspire is a set of configuration files, where you declare your data access mechanisms in terms of Java classes and arguments to those Java classes. Aspire will execute those Java classes and return the resulting objects. Hierarchical Data Sets are no exception.
An earlier O'Reilly article introduced Aspire: "For Tomcat Developers, Aspire Comes in a JAR." This will familiarize you with defining databases and calling SQL and Stored Procedures, as well as configuring and initializing Aspire.
A sample definition for a Hierarchical Data Set is as follows:
###################################
# ihdsTest data definition: section1
###################################
request.ihdsTest.className=com.ai.htmlgen.DBHashTableFormHandler1
request.ihdsTest.loopNames=works
#section2
request.ihdsTest.works.class_request.className=com.ai.htmlgen.GenericTableHandler6
request.ihdsTest.works.loopNames=childloop1
request.ihdsTest.works.query_request.className=com.ai.data.RowFileReader
request.ihdsTest.works.query_request.filename=aspire:\\samples
\\pop-table-tags\\properties\\pop-table.data
#section3
request.childloop1.class_request.classname=com.ai.htmlgen.GenericTableHandler6
request.childloop1.query_request.classname=com.ai.data.RowFileReader
request.childloop1.query_request.filename=aspire:\\samples\\pop-table-tags
\\properties\\pop-table.data
This definition has three sections. The data set is named
ihdsTest. The first section tells Aspire that the Java class
com.ai.htmlgen.DBHashTableFormHandler1 is responsible for
returning an object implementing ihds. Unless you code your own
implementation of ihds, you will use this class in every data set
definition. It's the pre-fabricated class that knows how to compose relational
assets into hierarchical assets. Line 2 of section 1 tells
DBHashTableFormHandler1 that this main data set has one loop
called works.
Section2 defines the loop works. A loop structure in Aspire
uses two Java classes: a class request (GenericTableHandler6) and
a Query request (RowFileReader).
RowFileReader reads a set of records from a flat file and makes
them look like a collection of rows and columns.
GenericTableHandler6 takes this collection and applies such
features as aggregate values and row numbers and implements the
ihds interface at the loop level. As with
DBHashtableFormHandler1, GenericTableHandler6 is
present in most definitions. RowFileReader might change, depending
on your data sources. For example, the following parts exist in this
category:
RowFileReader.DBRequestExecutor2 (for reading SQL).StoredProcedureExecutor2 (for reading from Stored
Procedures).XMLReader (for reading XML files).IDataCollection.Section2 also indicates that it has a child called childloop1.
GenericTableHandler6 will take this cue and look for section3,
identified by childloop1.
Section3 defines childloop1. The definition is identical to
section2, except that childloop1 has no children. Both section2 and
section3 use RowFileReaders. In practice, they can use any
combination of data reader parts.
Let me call this file ihds-test.properties. Include this file
in Aspire's master aspire.properties as follows:
application.includeFiles=aspire:\\samples\\hello-world
\\properties\\hello-world.properties,\
aspire:\\samples\\ihds-test\\ihds-test.properties,\
aspire:\\samples\\xml-reader\\xml-reader.properties
For the sake of completeness, I have included a couple of lines above and below that inclusion process.
ihdsNow that we have the definition, how do we call it from Java? Reading that first article will help considerably, but here is the Java code:
Hashtable args = new Hashtable();
args.put("key1".toLowerCase(), "value1");
IFactory factory = AppObjects.getFactory();
ihds hds = (ihds)factory.getObject("ihdsTest",args);
// use ihds
Aspire has a factory service, represented by the IFactory
interface. This factory interface allows you to call a Java class, identified by
a symbolic name called ihdsTest, with any arguments passed in as a
hashtable. The arguments are expected to be lowercase strings for the
downstream relational adapters.
|
ihds APIThe following code will walk through the ihds tree, printing it
out:
import com.ai.htmlgen.*;
import com.ai.common.TransformException;
import Java.io.*;
import com.ai.data.*;
// above code removed for clarity
public static void staticTransform(ihds data, PrintWriter out)
throws TransformException
{
try
{
writeALoop("MainData",data,out,"");
}
catch(DataException x)
{
throw new TransformException(
"Error: DebugTextTransform: Data Exception",x);
}
}
/**********************************************************
* A recursive function to write out a loop worth of ihds
**********************************************************
*/
private static void writeALoop(
String loopname, ihds data, PrintWriter out, String is)
throws DataException
{
println(out,is, ">> Writing data for loop:" + loopname);
// write metadata
IMetaData m = data.getMetaData();
IIterator columns = m.getIterator();
StringBuffer colBuffer = new StringBuffer();
for(columns.moveToFirst();!columns.isAtTheEnd();columns.moveToNext())
{
String columnName = (String)columns.getCurrentElement();
colBuffer.append(columnName).append("|");
}
println(out,is,colBuffer.toString());
//write individual rows
for(data.moveToFirst();!data.isAtTheEnd();data.moveToNext())
{
StringBuffer rowBuffer = new StringBuffer();
for(columns.moveToFirst();!columns.isAtTheEnd();columns.moveToNext())
{
String columnName = (String)columns.getCurrentElement();
rowBuffer.append(data.getValue(columnName));
rowBuffer.append("|");
}
println(out,is,rowBuffer.toString());
// recursive call to print children
IIterator children = data.getChildNames();
for(children.moveToFirst();!children.isAtTheEnd();children.moveToNext())
{
// for each child
String childName = (String)children.getCurrentElement();
ihds child = data.getChild(childName);
writeALoop(childName,child,out,is + "\t");
}
}
println(out,is,">> Writing data for loop:" + loopname + " is complete");
}
private static void println(PrintWriter out, String indentationString,
String line)
{
out.print(indentationString);
out.print(line);
out.print("\n");
}
// code removed for clarity
ihds Under TomcatThe facilities presented so far demonstrate accessing Hierarchical Data Sets anywhere in Java code, including command-line applications. When Aspire is initalized under Tomcat, it goes a step further and allows you to include data sets directly in your web pages. Currently supported formats include classic XML, object XML, text, and Excel data. Formats planned for the near future include Java class definitions to match the object XML, XSD, and generic HTML pages.
Before being able to obtain your web pages in one of these formats, you need to know how to initialize Aspire under Tomcat. Besides the article referenced above, see "Improve Your Career with Tomcat and Aspire." Once this is accomplished, your remaining work is to:
Add this section to the existing data definition configuration file:
###################################
# ihdsTestURL: linking to a URL
###################################
ihdsTestURL=aspire:\\samples\\ihds-test\\ihds-default-html-template.html
ihdsTestURL.formHandlerName=ihdsTest
request.ihdsTest.form_handler.class_request.className=
com.ai.htmlgen.DBHashTableFormHandler1
There are two parts to a URL defined in Aspire: the data source and the data transformation.
Aspire can transform data using JSP, XSLT, or tags. The default transformation,
tags, requires a template filename that includes the tags. The first line
indicates a transformation file for the data. The second line points to a data
definition called ihdsTest, which is defined down the line. Line 3
says essentially the same thing as line 1 of section 1. This discrepancy is due to
some backward compatibility with Aspire.
Aspire allows a Hierarchical Data Set to be transformed in one of two ways, generic or page-specific. This example definition is page-specific, because the presented HTML template is specific to that page. A generic transformation will take any Hierarchical Data Set belonging to any page and transform it in a generic manner. Generic transformations are included in the configuration file as follows:
# Generic transform support
# XML output
GenericTransform.Classic-xml.classname=
com.ai.xml.FormHandlerToXMLTransform
GenericTransform.Object-xml.classname=
com.ai.generictransforms.ObjectXMLGenericTransform
# Excel output
GenericTransform.Excel.classname=
com.ai.generictransforms.ExcelGenericTransform
# Text
GenericTransform.Text.classname=
com.ai.generictransforms.DebugTextTransform
These definitions are usually included in the master
aspire.properties file.
Once the URL is defined, you can see the resulting HTML page by calling the defined URL as follows:
http://yourhost:yourport/your-webapp/servlet/DisplayServlet?url=ihdsTestURL
This will produce an HTML page. Say that we want to call the URL and obtain the data as classic XML; simply add the following additional argument to the above URL:
&aspire_output_format=classic-xml
For Excel data, do something similar:
&aspire_output_format=Excel
The key is to tie down an argument called aspire_output_format
to a generic Java classname. It is very easy to write these generic
transformations to suit your output needs. The following example shows Excel's
generic transform implementation.
package com.ai.generictransforms;
import com.ai.htmlgen.*;
import com.ai.common.TransformException;
import Java.io.*;
import com.ai.data.*;
import Javax.servlet.http.*;
public class ExcelGenericTransform
extends AHttpGenericTransform
implements IFormHandlerTransform
{
private static String s_separator = "\t";
protected String getDerivedHeaders(HttpServletRequest request)
{
return "Content-Type=application/vnd.ms-excel|Content-Disposition=
filename=aspire-hierarchical-dataset.xls";
}
public void transform(ihds data, PrintWriter out)
throws TransformException
{
staticTransform(data,out);
}
public void transform(IFormHandler data, PrintWriter out)
throws TransformException
{
staticTransform((ihds)data,out);
}
public static void staticTransform(ihds data, PrintWriter out)
throws TransformException
{
try
{
writeALoop("MainData",data,out,"");
}
catch(DataException x)
{
throw new TransformException("Error: ExcelGenericTransform:
Data Exception",x);
}
}
private static void writeALoop(String loopname,
ihds data,
PrintWriter out,
String is)
throws DataException
{
println(out,is, ">> Writing data for loop:" + loopname);
// write metadata
IMetaData m = data.getMetaData();
IIterator columns = m.getIterator();
StringBuffer colBuffer = new StringBuffer();
for(columns.moveToFirst();!columns.isAtTheEnd();columns.moveToNext())
{
String columnName = (String)columns.getCurrentElement();
colBuffer.append(columnName).append(s_separator);
}
println(out,is,colBuffer.toString());
//write individual rows
for(data.moveToFirst();!data.isAtTheEnd();data.moveToNext())
{
StringBuffer rowBuffer = new StringBuffer();
for(columns.moveToFirst();!columns.isAtTheEnd();columns.moveToNext())
{
String columnName = (String)columns.getCurrentElement();
rowBuffer.append(data.getValue(columnName));
rowBuffer.append(s_separator);
}
println(out,is,rowBuffer.toString());
// recursive call to print children
IIterator children = data.getChildNames();
for(children.moveToFirst();!children.isAtTheEnd();children.moveToNext())
{
//for each child
String childName = (String)children.getCurrentElement();
ihds child = data.getChild(childName);
writeALoop(childName,child,out,is + "\t");
}
}
println(out,is,">> Writing data for loop:" + loopname + " is complete");
}
private static void println(PrintWriter out, String
indentationString, String line)
{
out.print(indentationString);
out.print(line);
out.print("\n");
}
}
The implications of these facilities are quite exciting to the Tomcat developer community. If page developers use this mechanism to retrieve data, they can put a series of data icons at the top of each page that allow end users to retrieve data in their preferred format. End users will benefit from Excel output as they can now work with data in their spreadsheets. B2B users can retrieve data as XML. Java and other programmers can retrieve the data binding as Java classes and can choose to work with objects, as opposed to XML.
All of the documented facilities are available for Tomcat developers free of cost and in a very small package. For a large number of students and entry-level programmers, this means that they can download a couple of megs of Tomcat and Aspire and sit with a tool like Dreamweaver and be immediately productive with any database of their choice.
As they progress in their learning experience, they can start writing plug-ins and other sophisticated Java programs that can do some specialized work while the basics are supplied by the framework. This ladder-like approach to learning Java, J2EE, XML, and the Enterprise is good.
Access Aspire's sample pages at Indent, Inc., should have a set of pages demonstrating the Hierarchical Data Sets by the time this article is published. You may have to scroll down to see the section that talks about Hierarchical Data Sets, as this URL demonstrates a few other features, as well.
I would be delighted to hear from you if you see any architectural anomalies with Hierarchical Data Sets as well as the potential of Hierarchical Data Sets in programming. You can email me any time at satya@activeintellect.com.
Satya Komatineni is the CTO at Indent, Inc. and the author of Aspire, an open source web development RAD tool for J2EE/XML.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.