I hate to admit it, but as a veteran C++ programmer and COM aficionado, I've spent an embarrassingly large part of the last decade thinking about how the objects in my code will be copied, duplicated, and marshalled from one place to the next. In other words, I spent a lot of time pushing bits around. These days, I'm spending more time with C# and the .NET environment, which offer a wide spectrum of language features and runtime services that make the art of programming vastly simpler, in almost every conceivable fashion -- but still, I find myself wondering about the precise semantics of all the various copying, cloning, and marshalling mechanisms at play in my code.
Even after spending the last few years with the C# language, I recently found it worthwhile to step back and analyze what happens in some very simple scenarios, such as copying a value from one variable to another, or passing those variables as arguments to a method call. And that is the focus of this article. Boring? Hardly -- consider the following questions:
ref
keyword?System.MarshalByRefObject a base class, rather than an attribute (or
interface)?System.String? It's a reference-type, but it seems to have
pass-by-value semantics.|
Related Reading
.NET Framework Essentials |
Throughout this article, we'll contemplate zen koans like these, and more. We'll
start slowly, with a review of value-types and reference-types in .NET. Then
we'll move on to more advanced terrain, deconstructing the System.ICloneable
interface, and even scratching the surface of .NET's remoting architecture to
explore the concept of marshalling.
C++ programmers saw the concept of the "well-behaved class" evolve to describe things like copy constructors, assignment operators, virtual destructors, and how these things should be applied to classes. Failure to conform to these guidelines of well-behavedness might produce compile-time errors or (far worse) run-time leaks that are terribly difficult to debug.
Thankfully, .NET languages like C# and VB.NET free us from all of this complexity. Or do they? A quick search of Google.com for "C# and 'copy constructor'" turns up quite a few developers who are a little uncertain! And rightly so -- the .NET runtime environment has its own set of rules and regulations that control an object's copying, cloning, and marshalling behavior. Some of these are outlined in Table 1. When you add the concept of boxing into the mix, .NET has (arguably) the most complex cloning and marshalling semantics of any language or runtime environment ever conceived.
Table 1: The cast of characters
System.ValueType |
Base class for value-types, which have pass-by-value semantics |
System.ICloneable |
Interface by which objects support creating clones of themselves |
Object.MemberwiseClone |
Protected method that represents the ability of all objects to duplicate themselves |
System.SerializableAttribute |
Attribute by which objects declare their support for serialization |
System.Runtime.Serialization.ISerializable |
Interface by which objects control their own serialization (and marshalling!) |
System.MarshalByRefObject |
Base class for objects that are accessed from remote app domains via proxy |
But before we look at all of these mechanisms in action, let us start at the beginning, with a quick review of .NET's distinction between value-types and reference-types.
I still remember being introduced to Java for the first time. I was attending an informal brown-bag presentation, wherein one fellow was trying to convince his coworkers (C++ programmers, the lot of us) that Java was the way of the future. "Just imagine -- no more pointers!" he said. I'd only just seen a glimpse of the language, but it sure looked a lot to me like everything was a pointer. And, sure enough, that turned out to be pretty much the case. Java brought fantastic productivity gains, but at the cost of terrible performance for a lot of applications (mainly due to its excessive use of the heap and incessant dereferencing of pointers).
The architects of .NET attempted to learn from Java's mistakes in this regard by
creating a framework-wide distinction between reference-types and value-types.
Put simply, value-types are those that derive from System.ValueType (either
directly or indirectly) and reference-types are those that do not. In C#,
value-types are declared using the
struct
or
enum
keywords, and reference-types declared with the
class
keyword. But neither of those distinctions are very helpful. The real
difference in most programmers' minds is that value-types have pass-by-value
semantics, and reference-types have pass-by-reference semantics.
The easiest way to see the difference is to write a few lines of code: make two copies of a variable, and try to modify them independently.
Listing 1: Simple copying of value- and reference-types in C#
struct MyStruct { int x; } // value-type!
class MyClass { int x; } // reference-type!
MyStruct s1 = new MyStruct(37);
MyStruct s2 = s1;
s2.x = 73;
MyClass c1 = new MyClass(37);
MyClass c2 = c1;
c2.x = 73;
Console.WriteLine("s1:{0}, s2:{1}, c1:{2}, c2:{3}",
s1,
s2,
c1,
c2);
//output: s1:37, s2:73, c1:73, c2:73
Stepping through this simple code in a debugger, you can see that the value-type
variable (MyStruct s1) is copied by-value into
s2, while the reference-type variable (MyClass c1)
is copied by-reference into
c2. So, modifying the value of
c2.x
also modifies the value of
c1.x
(because they're really the same value).
But what about boxed value-type objects? Somewhat surprisingly, the topic of
boxing does not have any real relevance here -- this is because the very act of
boxing a value-type variable involves making a memberwise copy of the variable,
from the stack onto the heap (and unboxing, vice versa). So value-type objects
are passed by value, even to destinations typed as System.Object.
(For more background on boxing, see the References section for links to Eric Gunnerson's articles on the topic.)
For ordinary method calls (no
ref
or
out
parameters, and no marshalling -- all of which will be discussed later), passing
a variable as an argument to a method (or property) is logically equivalent to
declaring another variable of the same type, and assigning its value to the
newly-declared variable.
No surprise, there. For both value- and reference-types, a shallow copy of the variable is made. For value-types, this means a member-wise copy is created. For reference-types, only the reference is copied (resulting in two references to the same object, as we saw earlier).
However, the situation is somewhat altered if the method parameter is decorated
with either the
ref
or
out
keyword. In those cases, for value-type parameters, a pointer to the object is
passed to the method (thus allowing the method body to alter the value of the
original object). This technique is known as passing a parameter "by
reference." This should be fairly intuitive (at least to former C++
programmers, who will see it's just like passing a pointer-type parameter; or
using the [out] attribute in COM). Of course, many programming languages make
this distinction in one way or another, not just those whose names begin with
the letter "C."
But what does it mean to pass a reference-type "by reference"? Isn't the
parameter already being passed by-reference, simply by virtue of not deriving
from System.ValueType? Should we perhaps expect a compiler warning, or an
error? No -- put simply, the
ref
keyword means the same thing for reference-types as it does for value-types: a reference
to the variable is passed to the method, rather than a shallow copy. For
classes (reference-types), this means a reference to a reference. This allows
the method to discard and reallocate the caller's variable (or even set it to
null). Again, the analogy in the COM and C++ world is
passing a pointer to a pointer.
The
out
keyword has very nearly the same semantics as
ref. However, unlike
ref, the method implementation is obligated to
instantiate and initialize a new variable, for which the caller has a pointer.
Effectively, this gives
out parameters the same semantics as a property or
method's return value.
Now that we've seen how the default variable-copying semantics in .NET work, you're all probably wondering how to override this behavior to create full, rich, deep copies of your objects (rather than squeak by with the dull, shallow copies provided by the runtime).
For example, imagine a class that represents a node in a doubly-linked list. Each node object contains a reference to the previous node, and the next node (or perhaps a null pointer, if the node is the first or last in the list). Clearly, a memberwise copy of any single node would not be desirable! Figure 1 illustrates the tragedy that would ensue, if a shallow copy of the head node were inadvertently made:
![]() |
| Figure 1: Deep vs. Shallow Copying |
Back in the days of C++, this is where so-called "copy constructors" and "assignment operator overloading" came into play. Now, it's true that in C# you can define a constructor that looks and feels very much like a C++-style copy constructor -- and several classes in the FCL do this -- but the truth is they probably shouldn't bother, because they can't overload the assignment operator.
Rather, a system-defined interface exists for classes and structs to declare to
the outside world that they support "deep copy" semantics. This interface
is System.ICloneable, and it has a single method: Clone. It doesn't get
any simpler.
|
Listing 2: Introducing the System.ICloneable interface
namespace System
{
interface ICloneable
{
object Clone();
}
}
But there are two problems with the ICloneable interface. First, it's
weakly-typed -- it's specified to return an
object, which could be darn well anything -- the lack
of generics (templates) in the current version of .NET makes this an
unavoidable necessity. This forces clients of ICloneable to downcast the
clone back to the type in question, which can sometimes result in cumbersome
and error-prone (or at least ugly) code.
It seems to me that the best analog of a "copy constructor/assignment
operator" pair, in C#, would be an implementation of ICloneable that delegates
to a public, type-safe alternative Clone method:
Listing 3: A well-designed, cloneable class
class MyCloneableClass : System.ICloneable
{
// Explicit interface method impl -- available for
// clients of ICloneable, but invisible to casual
// clients of MyCloneableClass
object ICloneable.Clone()
{
// simply delegate to our type-safe cousin
return this.Clone();
}
// Friendly, type-safe clone method
public virtual MyCloneableClass Clone()
{
// Start with a flat, memberwise copy
MyCloneableClass x =
this.MemberwiseClone() as MyCloneableClass;
// Then deep-copy everything that needs the
// special attention
x.somethingDeep = this.somethingDeep.Clone();
//...
return x;
}
}
|
Related Reading
Programming C# |
In the previous section, we made use of an interesting member function, present
on all .NET objects: MemberwiseClone. This method is a source of great
confusion in the developer community. Don't be fooled by its name -- it's
certainly not any kind of alternative to ICloneable.Clone, because it's
a protected method. Furthermore, it's not even overrideable by derived types,
because it's not a virtual method. Its only purpose in life seems to be to
assist us in our implementations of Clone methods, by performing the default
.NET shallow-copy in just one line of code.
Now, this begs the following question: why is there no corresponding "DeepClone"
method on System.Object? Shouldn't it be possible for the framework to provide
a method that queries each member for the ICloneable interface, and either
calls ICloneable.Clone on that member or performs a bitwise (shallow) copy of
the member, as appropriate? This would allow a great many implementations of
Clone to be implemented with just one trivial line of code:
Listing 4: Wishful thinking
public virtual MyCloneableClass Clone()
{
// let .NET do the heavy lifting
return this.DeepClone();
}
The only exceptions, of course, would be types that contain one or more references to objects that neglect to implement ICloneable as expected, or object graphs that contains circular references (like the doubly-linked list example in Figure 1). These objects would have to be copied "by hand," in the current manner.
Anyway, this brings us to the second problem with ICloneable: although it's a
well-known interface, defined by the system, the .NET runtime doesn't seem to
make use of it (at least not in any context that I've yet encountered). This is
in contrast to most of the other system-defined "IXXXXable" interfaces (e.g.:
ISerializable, IComparable, IEnumerable, IDisposable, etc.) each of which is
either called by the .NET runtime in some situation, or else serves to support
some language feature (e.g.: C#'s
foreach
and
using
constructs are supported by IEnumerable and IDisposable, respectively).
It seems that ICloneable is purely a convention -- no better or worse than
recommending that we all implement a public Clone method to accomplish the
same thing. The fact that ICloneable is an interface does, however, make it
easy for callers to query an object of unknown origin for its copy-semantics,
without resorting to reflection (although in practice, the need to do this does
not come up very often).
Listing 5: Do our best to make a copy of object x, deep or shallow
public static object MakeCopyOf( object x)
{
if (x is ICloneable)
{
// Return a deep copy of the object
return ((ICloneable)x).Clone();
}
else if (x is ValueType)
{
// Return a shallow copy of the value
return ((ValueType)x);
}
else
{
// Without resorting to reflection or serialization,
// all we can do is fall back to default copy semantics,
// which will return a ref to the same physical object
// (not what we want!)
throw new
System.NotSupportedException("object not cloneable");
}
}
An interesting case study in the field of object-cloning is that of
System.String. We all know how easy it is give pass-by-reference semantics to a
value-type -- just box it, and copy the
object
around. But how do you give pass-by-value semantics to a reference-type? Surely
it must be possible, because System.String gets away with it. Or is
System.String special in some way?
To be sure, strings are given a lot of special treatment in .NET. They even have
some of their very own Intermediate Language (IL) opcodes. However, there is no
magic at play, here -- System.String accomplishes its pass-by-value trick by
virtue of being immutable, by design. In other words, strings in .NET are
passed by-reference just like instances of any other class. But you can't
easily test that hypothesis without modifying the string's contents, and
there simply aren't any methods that modify a string without returning a newly-created string instance.
Every method that might appear, at first glance, to modify a string in fact
returns a modified copy of the string. Unlike C++ and some other languages,
.NET does not offer the concept of "const" methods -- methods that do not modify
any of the object's member variables. If it did, however, then every instance
method on the System.String class would surely be marked "const".
This clever design is similar to the "pass-by-reference-but-copy-on-write semantics" made popular by the C++ string classes found in STL and MFC. This is a very efficient design, because strings take great resources to copy (it should be done only when necessary).
Note that it is probably not worthwhile to imitate this technique in your own
classes. Only strings are used so heavily as to justify the excessive amount of
code involved (viz. a whole separate class, System.Text.StringBuilder, is
needed to avoid spurious copying in some other common usage scenarios). But
it's always good to understand how the magic works.
Now let's leave the topic of object-cloning behind, for a while, and expand our horizons by analyzing what happens when we pass parameters to objects that live in remote app domains!
The topic of remoting in .NET (whereby objects communicate across AppDomain boundaries) is long and complex -- far beyond the scope of this article. However, a central premise of all remoting architectures is marshalling, and marshalling is very closely related to the subject matter of this article (namely, the passing of objects as arguments to method calls), so it's worth taking a look at how marshalling works in .NET remoting. (See the References section for some interesting links to learn more about remoting in .NET, in general.)
For our purposes, marshalling can be defined as the mechanism by which arguments to/from a method call are transported, across some communication channel, to a remote recipient. Often the process of marshalling involves serialization -- persisting the object's state into a stream, and reconstituting the object "over there." Other times, it involves the creation of a proxy object, which will in turn marshal arguments to subsequent method calls back and forth across the wall.
The former case is known as "marshal-by-value" (or MBV), and it's very much like the pass-by-value semantics exhibited by value-types. The latter case is known as "marshal-by-reference" (or MBR), and, no surprise, it's very much like the pass-by-reference semantics that we've already seen.. But the most important thing to understand about marshalling in .NET is that the default semantics are different: by default, all objects in .NET (both value- and reference-types) are marshalled by value when sent across the "wire" to a remote AppDomain.
But how can a reference-type be passed by-value? Didn't we learn, back in Listing
5, that this was impossible (without resorting to reflection or
serialization)? We did. And indeed it's true -- if you attempt to pass an MBV
object that is not serializable (marked with the
[Serializable]
attribute), you will experience a SerializationException at run time. Listing 6
demonstrates this phenomenon.
|
Listing 6: Playing around with inter-appdomain marshalling
using System;
// not [Serializable]
struct SimpleValueType
{
public int a;
public int b;
public int c;
}
class MainMosdule
{
static void Main()
{
// Create a "remote" appdomain
AppDomain testDomain =
AppDomain.CreateDomain( "testDomain");
MyRemoteableClass remoteObject =
(MyRemoteableClass)testDomain.CreateInstanceAndUnwrap(
"test",
"MyRemoteableClass");
// Initialize a SimpleValueType
SimpleValueType x1 = new SimpleValueType();
x1.a = x1.b = x1.c = 7;
// Try to send it across the wire
// (will fail unless [Serializable]!)
remoteObject.DoSomethingWithSimpleValueType(x1);
// Access the property X remotely
remoteObject.X = remoteObject.X+1;
Console.WriteLine("{0}", remoteObject.X);
}
}
The resulting exception looks something like this (long, boring stack trace
omitted for brevity). But uncomment the
[Serializable]
attribute, and the program will run successfully.
Unhandled Exception:
System.Runtime.Serialization.SerializationException:
The type SimpleValueType in Assembly test,
Version=0.0.0.0, Culture=neutral,
PublicKeyToken=null is not marked as serializable.
To override this default MBV behavior, one can simply derive one's class from
System.MarshalByRefObject -- and this is exactly what
MyRemoteableClass
does, as seen in Listing 7. (This is the same class referenced in Listing
6. We were skipping ahead a bit, but without at least one MBR object in
the picture, there would be nothing to do the marshalling!)
Note that MarshalByRefObject is a base class, not an attribute, nor even an
interface. This has a number of implications, the most obvious of which is that
value-types cannot be made to marshal by reference (value-types in .NET cannot
explicitly inherit from any base class other than System.ValueType). Perhaps
the next most obvious implication is that you can't just revisit any old class
to make it MBR -- .NET does not allow multiple inheritance of base classes, so
you'll likely have to plan for that capability, from the ground up (or else end
up writing a bunch of "MarshalByRefWrapper" classes, which
isn't very fun).
Listing 7: A simple MBR-capable class (as seen in Listing 6)
class MyRemoteableClass : System.MarshalByRefObject
{
private int x;
public void DoSomethingWithSimpleValueType(SimpleValueType vt)
{ this.x = vt.a + vt.b + vt.c; }
public int X
{
get { return this.x; }
set { this.x = value; }
}
}
Why is MarshalByRefObject a base class, rather than an interface or attribute?
The easiest explanation is that MBR objects need quite a bit of boilerplate
functionality (with regard to activation policies, lifetime services, etc.)
that is best encapsulated in a base class, and re-used with
implementation-inheritance. No other solution would allow us to create a MBR
class with just a single line of code (let alone a single keyword).
Earlier, we saw that how the C#
ref
and
out keywords affected the semantics of parameters
passed to method calls. How do these keywords affect parameters marshalled
between appdomains? In the intra-AppDomain case, we saw that using either
ref or
out was tantamount to passing a pointer to the
argument (regardless of whether the argument was a reference-type or not).
But one cannot pass a pointer from one process to another -- let alone from one
machine to another! Instead, one must think of marshalling as a form of
messaging (which it is). Either way, the effect is largely the same. The C#
code may look like an ordinary function call, but under the hood, the method's
arguments are being packaged into an envelope and mailed away ... The
ref keyword is an instruction to wait for a response,
because the argument will travel both to and from the remote destination, along
the remoting channel. Parameters with the
out keyword (as well as return values) are only sent
back, from callee to caller, along the channel. (Arguably, the IDL attribute
keywords used to describe the same concepts in DCOM and RPC were more
intuitive. These are listed in Table 2.) Just like DCOM (and RPC before
it) it's the proxy objects on either side of the remoting channel that do all
the dirty work.
Table 2: Analagous marshalling keywords in DCOM and .NET
| RPC/DCOM keyword | C# equivalent remoting keyword |
|---|---|
[in] |
(default) |
[in,out] |
ref |
[out] |
out |
Hopefully this article has shed some light on some basic aspects of .NET programming that should be quite straightforward -- moving copies of objects around from one place to another. But of course, nothing in modern computing is straightforward, anymore (especially if it involves remoting in any way). The .NET framework is a vast and complicated space to work in, but at the same time its design is vastly more intuitive for programmers than anything that came before.
We've examined everything from the differences between value-types and
reference-types, the semantics of return-values, C#'s
out
and
ref
keywords, the ICloneable interface, the MemberwiseClone method, the special
semantics of the framework's string class, and even a little bit of remoting
and marshalling. Whew! Who knew that pushing bits around would be so hard?
"Passing Parameters" C# Programmer's Reference, MSDN Library
"Argument Passing ByVal and ByRef" Visual Basic Language Concepts, MSDN Library
"Nice Box. What's in It?" Eric Gunnerson, MSDN Library.
"Open the Box! Quick!" Eric Gunnerson, MSDN Library.
".NET Remoting: Design and Develop Seamless Distributed Applications for the Common Language Runtime" Dino Esposito, MSDN Magazine.
Return to ONDotnet.com
Copyright © 2009 O'Reilly Media, Inc.