oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Applying "Digital Hub" Concepts to Enterprise Software Design, Part 3

by Adam Behringer

Welcome back! In the previous installment of this series, we designed and built a flexible database (a "hub") that will store weather data as it pours in from all over the globe. However, the point of our hub-and-spoke design is that the data gets passed around to different scripts and programs that interact with it. It does not just sit in the database. The portable data is the "spoke" that will connect all of our client applications to the hub.

What form should the data take when it is not in the database? Since we will have a wide variety of client applications and scripts sending and receiving data, we need a format that is not proprietary to one computer language or vendor. We also want something simple to work with and flexible enough to adapt to different situations as they arise. Sounds like a job for XML, which we will to use to create the data's home away from home. Once our data is packed into an XML format, we can send it through a web service, as a file, or over HTTP. Whichever method we use to transfer it among the components in our system, the XML format will stay the same.

If you have not used XML before, fear not! It is basically a text format that conforms to some simple rules. Nodes are created by using opening and closing tags, so it will look familiar if you have done any web site coding (XML and HTML have common ancestry). The important thing for our project is to know that XML nodes can contain other "child nodes," creating a hierarchy of data. Here is the official specification if you want a more technical explanation. Do not get overwhelmed by the specification, though. We do not need to know every feature of XML to make good use of the technology in our applications.

Duplicate the Database in XML

Since we are going to have several applications interacting with the data, we want to make sure that the design does not change very often. Do you remember how we dealt with this design issue in the database? We designed the database to contain three related tables.

The DataType table stores a list of basic data types that our client applications will know how to validate (such as number and string). The MeasurementType table stores more specific descriptions of the kinds of data that we are going to store and the names that will be used to display the data for users (such as Temperature and Wind Speed). Finally, the MeasuredData table stores specific data measurements.

With this design, new types of weather data can be added on the fly without redesigning the database. We should extend this flexibility to our other components, too. In this particular case, we will design XML that can accommodate new measurement types.

We can accomplish this by having a branch of the XML that describes the types of weather measurements stored in the database (in the MeasurementType table). Each measurement type in our XML will include all of the information stored in the database for that type, including the ID. When we list the weather data (from the MeasuredData table), it will have a typeId attribute that will allow us to match the data with its type. This should be a familiar concept for those of you with relational database experience.

Creating a Mock-Up

We are going to use a basic text editor to create a sample XML file. On my Mac, I prefer TextEdit or xCode, as they are always sitting there in my dock. Type the code below and save it with as weather.xml". Make sure the file extension is ".xml". Sometimes, text editors like to append a ".txt" file extension.

<?xml version="1.0" encoding="UTF-8"?>
        <type id="0" name="Temperature" dataType="number" />
        <type id="1" name="Wind Speed" dataType="number" />
        <item id="0" time="2004-04-17 00:00:00 -0700" typeId="0" data="60" />
        <item id="1" time="2004-04-17 00:00:00 -0700" typeId="1" data="3.2" />

Both the Mac and Windows versions of Microsoft's Internet Explorer (IE) have a nifty way of displaying XML. Launch IE, choose Open File... from the file menu, and choose the XML file that you just created. If your tags match correctly, it will display your file in a way that allows for the collapsing and expanding of XML branches. Plus, it's colorful! Remember to never stop enjoying the small things in life.

Let's look at each tag from the top down.

Related Reading

Learning XML
By Erik T. Ray

Every valid XML must have a single root node. Ours is called Weather and encloses all of the other nodes.
All of the information from the MeasurementType database table is stored between these tags.
Each type tag stores the information from one row in the MeasurementType table. Each column has a separate XML attribute. The id attribute stores information from the dataTypeId column, name stores information from the name column, and dataType stores the name of the DataType row related to the current measurement type.
All of the information from the MeasuredData database table is stored between these tags.
The item tags will each contain data for a row in the MeasuredData table. The id attribute stores information from the measuredDataId column, and time stores the date from the timeTaken column. Notice that timeTaken is stored as a Date type in the database, so it must be converted to a string in order to pass it in the XML. The data tag maps to the data column and typeId maps to the measurementTypeId column.

Notice that there are two different relationships between tables, and we treat them in different ways in our XML design. Since the information in DataType will not change often, we're going to flatten the data by using the name of the dataType. Remember that this table stores types such as number and string. If we felt that the data types might change, we would want to split them out into their own node group and refer to them by their IDs.

On the other hand, we're going to keep the relationship between measured data and its measured type flexible. We will do this by including all of the information about measure types in the XML and a typeId in each data item. When our applications parse the data, they can associate a measure type with a data item using the ID numbers. It takes a bit more work to do this, but it will pay off when scientists add new measurement types (which they are likely to do). It's essential to have a good grasp of the business problem you are trying to solve so that you can properly evaluate these kinds of decisions.

Attributes Vs. Child Nodes

You may be wondering why I chose to store some of the data in attributes like this:

<item id="0" time="2004-04-17 00:00:00 -0700" typeId="0" data="60" />

when I could instead have stored the data as child nodes in the XML, like this:

    <time>2004-04-17 00:00:00 -0700</time>

Technically, both would work. Attributes are a bit more compact and are easier to parse, but child nodes are more flexible. How do we decide which to use in a given situation? Let us use the typeId attribute of item as an example. It has several properties that make me think of it as an attribute:

  1. It needs no child components. In fact it is a simple data type (a number).
  2. There is only one for each item.
  3. Every single item will have one.
  4. It will not contain any characters that will confuse an XML parser.

If a typeId had child nodes of its own, then we would obviously need to make it a node itself. Attributes can only be used "at the end of the line." It is also helpful to store data in a node if there are going to be characters that would confuse an XML parser. For example, if an attribute itself contained a snippet of XML, we could put it in a CDATA tag so that it would get parsed as a string, and not as part of the XML. In our example, I'm going to assume (for the sake of simplicity) that any information being added to the data field would first be validated to prevent these types of characters from being entered.

Since there is more than one way to go about this issue, be sure you talk about it with your development team before diving in. It's good to have a standard way to handle this issue across your company. For example, you may decide that any data directly entered by a user is stored in a node. I encourage you to post a comment at the end of this article if you would like to share what has worked well for you.

Extra Credit

One thing that I've noticed about software users is that they never do what you expect. When I have used XML on previous real-world projects, I've found that users sometimes like to edit the XML by hand, even when you tell them not to touch it. Its human-readable format makes this tempting if a tool is not flexible enough to do what a user wants to do, but sometimes the formatting or data gets messed up by accident.

Our system would be more robust if we could validate the XML passing back and form between components. That way, we could contain corrupt data and be notified about it right away, rather than finding out after the corrupted data has spread to other parts of the system. The XML specification allows for validation using either an XML schema or a DTD. If you have some experience with XML, your extra credit is to build an XML schema that can be used to validate the XML each time an application reads it.

Where Are We?

Let's review what we have so far. We have a flexible database that has a few data records in it. We also have an XML template that retains most of the flexibility of the database and will allow us to transport the data when it is outside of the database. What we need to do next is write some code that will transfer the data between these two forms.

In our next tutorial, we will write some Java code that will grab data out of the database and create an XML string that conforms to our design. Of course, we will also go in the other direction and write code to parse an XML file and add the new data to the database.


Adam Behringer works as the CEO of Bee Documents, which strives to help legal and medical firms efficiently manage their documents.

Return to the Mac DevCenter