Applying "Digital Hub" Concepts to Enterprise Software Design, Part 3by Adam Behringer
Welcome back! In the previous installment of this series, we designed and built a flexible database (a "hub") that will store weather data as it pours in from all over the globe. However, the point of our hub-and-spoke design is that the data gets passed around to different scripts and programs that interact with it. It does not just sit in the database. The portable data is the "spoke" that will connect all of our client applications to the hub.
What form should the data take when it is not in the database? Since we will have a wide variety of client applications and scripts sending and receiving data, we need a format that is not proprietary to one computer language or vendor. We also want something simple to work with and flexible enough to adapt to different situations as they arise. Sounds like a job for XML, which we will to use to create the data's home away from home. Once our data is packed into an XML format, we can send it through a web service, as a file, or over HTTP. Whichever method we use to transfer it among the components in our system, the XML format will stay the same.
If you have not used XML before, fear not! It is basically a text format that conforms to some simple rules. Nodes are created by using opening and closing tags, so it will look familiar if you have done any web site coding (XML and HTML have common ancestry). The important thing for our project is to know that XML nodes can contain other "child nodes," creating a hierarchy of data. Here is the official specification if you want a more technical explanation. Do not get overwhelmed by the specification, though. We do not need to know every feature of XML to make good use of the technology in our applications.
Duplicate the Database in XML
Since we are going to have several applications interacting with the data, we want to make sure that the design does not change very often. Do you remember how we dealt with this design issue in the database? We designed the database to contain three related tables.
DataType table stores a list of basic data types that our client applications
will know how to validate (such as number and string). The
stores more specific descriptions of the kinds of data that we are going to
store and the names that will be used to display the data for users (such as
Wind Speed). Finally, the
MeasuredData table stores specific
With this design, new types of weather data can be added on the fly without redesigning the database. We should extend this flexibility to our other components, too. In this particular case, we will design XML that can accommodate new measurement types.
We can accomplish this by having a branch of the XML that describes the types
of weather measurements stored in the database (in the
Each measurement type in our XML will include all of the information stored
in the database for that type, including the ID. When we list the weather data
MeasuredData table), it will have a typeId attribute that will allow
us to match the data with its type. This should be a familiar concept for those
of you with relational database experience.
Creating a Mock-Up
We are going to use a basic text editor to create a sample XML file. On my Mac, I prefer TextEdit or xCode, as they are always sitting there in my dock. Type the code below and save it with as weather.xml". Make sure the file extension is ".xml". Sometimes, text editors like to append a ".txt" file extension.
<?xml version="1.0" encoding="UTF-8"?> <weather> <measureTypes> <type id="0" name="Temperature" dataType="number" /> <type id="1" name="Wind Speed" dataType="number" /> </measureTypes> <data> <item id="0" time="2004-04-17 00:00:00 -0700" typeId="0" data="60" /> <item id="1" time="2004-04-17 00:00:00 -0700" typeId="1" data="3.2" /> </data> </weather>
Both the Mac and Windows versions of Microsoft's Internet Explorer (IE) have a nifty way of displaying XML. Launch IE, choose Open File... from the file menu, and choose the XML file that you just created. If your tags match correctly, it will display your file in a way that allows for the collapsing and expanding of XML branches. Plus, it's colorful! Remember to never stop enjoying the small things in life.
Let's look at each tag from the top down.
- Every valid XML must have a single root node. Ours is called
Weatherand encloses all of the other nodes.
- All of the information from the
MeasurementTypedatabase table is stored between these tags.
typetag stores the information from one row in the
MeasurementTypetable. Each column has a separate XML attribute. The
idattribute stores information from the
namestores information from the
dataTypestores the name of the
DataTyperow related to the current measurement type.
- All of the information from the
MeasuredDatadatabase table is stored between these tags.
itemtags will each contain data for a row in the
idattribute stores information from the
timestores the date from the
timeTakencolumn. Notice that
timeTakenis stored as a
Datetype in the database, so it must be converted to a string in order to pass it in the XML. The
datatag maps to the
typeIdmaps to the
Notice that there are two different relationships between tables, and we treat
them in different ways in our XML design. Since the information in
will not change often, we're going to flatten the data by using the name of the
dataType. Remember that this table stores types such as
If we felt that the data types might change, we would want to split them out
into their own node group and refer to them by their IDs.
On the other hand, we're going to keep the relationship between measured data
and its measured type flexible. We will do this by including all of the information
about measure types in the XML and a
typeId in each data item. When
our applications parse the data, they can associate a measure type with a data
item using the ID numbers. It takes a bit more work to do this, but it will
pay off when scientists add new measurement types (which they are likely
to do). It's essential to have a good grasp of the business problem you are
trying to solve so that you can properly evaluate these kinds of decisions.
Attributes Vs. Child Nodes
You may be wondering why I chose to store some of the data in attributes like this:
<item id="0" time="2004-04-17 00:00:00 -0700" typeId="0" data="60" />
when I could instead have stored the data as child nodes in the XML, like this:
<item> <id>0</id> <time>2004-04-17 00:00:00 -0700</time> <typeId>0</typeId> <data>60</data> </item>
Technically, both would work. Attributes are a bit more compact and are easier
to parse, but child nodes are more flexible. How do we decide which to use in
a given situation? Let us use the
typeId attribute of item as an example. It
has several properties that make me think of it as an attribute:
- It needs no child components. In fact it is a simple data type (a number).
- There is only one for each item.
- Every single item will have one.
- It will not contain any characters that will confuse an XML parser.
typeId had child nodes of its own, then we would obviously need to make
it a node itself. Attributes can only be used "at the end of the line." It is
also helpful to store data in a node if there are going to be characters that
would confuse an XML parser. For example, if an attribute itself contained a
snippet of XML, we could put it in a
CDATA tag so that it would get parsed as
a string, and not as part of the XML. In our example, I'm going to assume (for
the sake of simplicity) that any information being added to the data field would first be validated to prevent these types of characters from being entered.
Since there is more than one way to go about this issue, be sure you talk about it with your development team before diving in. It's good to have a standard way to handle this issue across your company. For example, you may decide that any data directly entered by a user is stored in a node. I encourage you to post a comment at the end of this article if you would like to share what has worked well for you.
One thing that I've noticed about software users is that they never do what you expect. When I have used XML on previous real-world projects, I've found that users sometimes like to edit the XML by hand, even when you tell them not to touch it. Its human-readable format makes this tempting if a tool is not flexible enough to do what a user wants to do, but sometimes the formatting or data gets messed up by accident.
Our system would be more robust if we could validate the XML passing back and form between components. That way, we could contain corrupt data and be notified about it right away, rather than finding out after the corrupted data has spread to other parts of the system. The XML specification allows for validation using either an XML schema or a DTD. If you have some experience with XML, your extra credit is to build an XML schema that can be used to validate the XML each time an application reads it.
Where Are We?
Let's review what we have so far. We have a flexible database that has a few data records in it. We also have an XML template that retains most of the flexibility of the database and will allow us to transport the data when it is outside of the database. What we need to do next is write some code that will transfer the data between these two forms.
In our next tutorial, we will write some Java code that will grab data out of the database and create an XML string that conforms to our design. Of course, we will also go in the other direction and write code to parse an XML file and add the new data to the database.
Return to the Mac DevCenter