Serialization in .NET, Part 1
by Dan Frumin01/26/2004
Overview
For many years, research scientists have promised us that memory will become unlimited and persistent. Unfortunately, this memory nirvana has not come to pass. Considering that most applications are meant to be used more than once, this places the requirement of persistence on the developer. Even if memory were to become persistent, the need to exchange data across multiple applications or computers would again place the developer in a position to implement some form of persistence mechanism. Serialization of data using built-in .NET support makes persistence easy and reusable. In this article, we will review the support available for serialization and look at a couple of scenarios for using it.
Introduction to Serialization
So, what is serialization? Semantically, serialization is the act of publishing or producing an item in the form of a series of information bits. This is a fancy way of saying that serialization is taking some data structure and pumping it out into a stream of bytes that we can then use. A concrete example will help us out; let's say that we have a data structure in memory representing an employee. The data structure for the class has the following form:
public class Employee
{
public string FirstName;
public string LastName;
public DateTime StartDate;
public int Age;
private int EmpID;
...
}
For the purposes of our application, we'll further assume that the EmpID is
being set randomly by the constructor and is only used for internal
processing. The need for a private member will become evident in the
examples that follow.
|
Related Reading
.NET Framework Essentials |
At some point, our application will likely need to persist this data structure to disk, minimally so that the user can close the application down and reopen it. Traditionally, we write some code that would go through the list of employees and add a record for each employee into a database. When we want to, we read the records from the database back into memory. In a sense, we serialized the data structure for each employee into a series of bytes in the database and then deserialized the bytes back into a data structure at a future time. This is a bit of a simplification, but I suspect you get the point.
Now that we know what serialization is, let's consider what it's good for. Most uses of serialization fall into two categories: persistence and data interchange. Persistence allows us to store the information on some non-volatile mechanism for future use. This includes multiple uses of our application, archiving, and so on. Data interchange is a bit more versatile in its uses. If our application takes the form of an N-tier solution, it will need to transfer information from client to server, likely using a network protocol such as TCP. To achieve this we would serialize the data structure into a series of bytes that we can transfer over the network. Another use of serialization for data interchange is the use of XML serialization to allow our application to share data with another application altogether. As you can see, serialization is a part of many different solutions within our application.
Before .NET, all approaches to serialization required custom code. For
example, we could take a single instance of the employee data structure and
manually generate the string for an XML document, which we could then save to
the disk. Luckily for us, the Microsoft .NET team decided to save us the
hassle of doing this work. Unfortunately, since the .NET team is composed
of several sub-teams, we find that there are two distinct solutions to the
problem. The first solution to the problem exists within the
System.Runtime.Serialization namespace and consists of a generic solution to
serialization. This is the more powerful of the two solutions, as it is
more generalized, customizable, and extensible. The second solution is
specific to XML serialization and is implemented in the
System.Xml.Serialization namespace. Both implement serialization
techniques, albeit with some minor differences.
The XmlSerializer Class
We'll start by looking at the XmlSerializer class, since it's the easiest of the
two to apply and debug. When instantiating a new XmlSerializer,
you need to tell it which class signature to apply. We pass that
information in the constructor. After that, it's a matter of calling the
Serialize method with one of a number of overrides, including a TextWriter
that allows us to work with the resulting string in memory. Here is some
sample code:
XmlSerializer xs = new XmlSerializer(typeof(Employee));
StringWriter sw = new StringWriter();
xs.Serialize(sw, emp);
At this point, the StringWriter has an XML document. If we output it, we
see it looks like this:
<?xml version="1.0" encoding="utf-16"?>
<Employee xmlns:xsd=http://www.w3.org/2001/XMLSchema
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<StartDate>2002-06-23T00:00:00.0000000-07:00</StartDate>
<Age>25</Age>
</Employee>
You should note a few things about this output. First, it is a correctly
formatted XML document, which means we can pass it around to other applications
or to our own application across a network path. Second, all output was
made in the form of elements, named after their internal class or member
names. Third, and lastly, only public elements were exported to the XML
document. This last consideration is quite relevant, as it prevents us
from fully recreating the state of this object at a future or remote
instantiation of the application. If the EmpID member cannot be recreated
each time, then we must persist it along with the public members. We'll
address this issue by using the alternative serialization mechanism.
The utility of the XmlSerializer class should be clear. Only three lines
of code were required to give us a mechanism that allows us to interchange data
with other applications using the XML format. That's pretty
impressive. But what if the other application uses a different XML schema
than we do? Or maybe the other application doesn't care to see certain
elements? We can adjust our output using a few attributes, best seen in
the code sample below:
public class Employee
{
// use an attribute
[XmlAttribute]
public string FirstName;
// use an attribute with a custom name
[XmlAttribute("FamilyName")]
public string LastName;
// do not output this data
[XmlIgnore]
public DateTime StartDate;
// use an element with a custom name
[XmlElement("EmployeeAge")]
public int Age;
private int EmpID;
...
}
Which results in the following output:
<?xml version="1.0" encoding="utf-16"?>
<Employee xmlns:xsd=http://www.w3.org/2001/XMLSchema
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
FirstName="John" FamilyName="Smith">
<EmployeeAge>25</EmployeeAge>
</Employee>
As you can see, the attributes applied to our class definition were used by the
XmlSerializer to adjust the output. The XmlSerializer is indeed very
useful, and offers several other capabilities, including the ability to
deserialize a class from an XML document. That learning is left up to the
reader, as we will now divert our attention to the more generic serialization
solution available in .NET.
.NET Formatters
As mentioned above, .NET offers a different solution for serialization within
the System.Runtime.Serialization namespace. This is the more generic
approach; as such, it requires a bit more code, but offers
significantly expanded capabilities. This implementation is based on a
generic IFormatter interface that can implement formatters of different
kinds. .NET ships with built-in formatters for binary streams (a series
of bytes) and for SOAP messages. Should you ever need to, you can create
your own custom formatter.
In order to use a formatter, we must mark the class using the Serializable
attribute. That informs the formatter that it can go ahead and attempt to
serialize the class. After that, we use a bit of code to get at our
output. Here's a sample using the BinaryFormatter, which resides in the
System.Runtime.Serialization.Formatters.Binary namespace:
[Serializable]
public class Employee
{ ... }
BinaryFormatter bf = new BinaryFormatter();
FileStream fs = new FileStream("output.bin",FileMode.Create);
bf.Serialize(fs, emp);
fs.Close();
As you might notice, the BinaryFormatter uses a stream for its output. In
this case I used a FileStream, but I just as easily could have used a
MemoryStream if I needed to work with the byte output immediately (for example,
to send it over a TCP connection.). If there were members of the class
that we didn't want serialized, we could mark them with the NonSerialized
attribute, which is analogous to the XmlIgnore attribute.
At this point, it makes sense for us to look at a quick sample of
deserialization. Both the XmlSerializer and the IFormatter interface
support deserialization, with slightly different code. Here's a sample
using the BinaryFormatter to deserialize the output from our previous sample:
FileStream input = new FileStream("output.bin", FileMode.Open);
Employee emp2 = (Employee) bf.Deserialize(input);
input.Close();
As you can see, it's pretty easy to bring the object back into your application
(for example, the next time the user starts the application). One
interesting difference between the formatters and the XmlSerializer is that the
formatters output a complete copy of the object, including all of the private
members. This is useful for persistence, or where you need to recreate a
complete state at a future or remote instance of the application.
The Serialization namespace also offers the SoapFormatter, which generates SOAP-compatible messages as its output. This particular formatter is useful for
remoting applications that are based on the SOAP protocol. The
SoapFormatter offers the developer levels of control similar to the
XmlSerializer in defining the schema for the output. A series of
attributes are offered to the developer for this customization, though we won't
cover them in this article.
Extended Serialization
So far, we've used serialization only on very simple classes, with limited data types. The obvious question that follows is, "How far can this take us?" The answer is surprisingly comforting, largely because the built-in support for serialization is quite robust. However, it does require a little bit of familiarization on our part to get used to what it will and won't do.
The most interesting aspect of serialization is that it implements dynamic
navigation of object graphs. That is to say, it will navigate through
sub-objects wherever possible to serialize the complete set. Let's
assume that our Employee object has a reference to another Employee named
Manager. If we try and serialize the object, we will receive the
following output from the XmlSerializer:
<?xml version="1.0" encoding="utf-16"?>
<Employee xmlns:xsd=http://www.w3.org/2001/XMLSchema
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<StartDate>2002-06-23T00:00:00.0000000-07:00</StartDate>
<Age>25</Age>
<Manager>
<FirstName>Betty</FirstName>
<LastName>Jones</LastName>
<StartDate>1999-02-12T00:00:00.0000000-08:00</StartDate>
<Age>32</Age>
</Manager>
</Employee>
Both the XmlSerializer and the BinaryFormatter are able to handle this type of
nested object graph. Each will try to serialize the next object in the
graph according to its particular needs. For example, the XmlSerializer
will respect the XmlIgnore and other available attributes in sub-objects in the
graph. The BinaryFormatter, on the other hand, requires that every
sub-class be marked with the Serializable attribute and implement
serialization. Most, but not all, of the core objects in the .NET
framework implement this interface.
Unfortunately, the XmlSerializer is a bit more limited in its support for
serialization of core classes. There are several objects that it cannot
serialize, including, but not limited to, any classes that implement the
IDictionary interface (e.g., Hashtable). This can be somewhat limiting and
may require custom code on your part. The BinaryFormatter is able to
handle many of these classes without any difficulty.
But what happens when even the BinaryFormatter can't handle our objects?
Sadly, not all of the classes within the .NET framework implement the ISerializable
interface. A list of which classes implement the ISerializable interface
is available in
this MSDN article. In these cases, we can implement the
ISerializable interface ourselves and manually serialize/deserialize the
class contents. But this is something we will cover in a separate
article.
Summary
By now, you've seen the basics of serialization and deserialization. You've seen how you can use serialization for persistence, data interchange, and in some cases, even debugging the internal contents of your classes. All of these are valuable applications for a powerful tool provided to us by the Microsoft .NET team. I hope this article allows you to maximize the value of these tools in your own solutions.
Dan Frumin is a long-time technology executive, with over 10 years of experience in the industry.
Return to ONDotnet.com
-
The root level element
2006-06-25 23:38:47 coskunaydinoglu [View]
- Trackback from http://blog.dannyboyd.com/archive/2004/04/11/254.aspx
Serialization in .NET, Part 2
2004-04-11 07:41:43 [View]
- Trackback from http://blog.dannyboyd.com/archive/2004/01/27/178.aspx
Serialization in .NET
2004-01-27 10:45:19 [View]

