Copying, Cloning, and Marshalling in .NET
by Shawn Van Ness11/25/2002
I hate to admit it, but as a veteran C++ programmer and COM aficionado, I've spent an embarrassingly large part of the last decade thinking about how the objects in my code will be copied, duplicated, and marshalled from one place to the next. In other words, I spent a lot of time pushing bits around. These days, I'm spending more time with C# and the .NET environment, which offer a wide spectrum of language features and runtime services that make the art of programming vastly simpler, in almost every conceivable fashion -- but still, I find myself wondering about the precise semantics of all the various copying, cloning, and marshalling mechanisms at play in my code.
Even after spending the last few years with the C# language, I recently found it worthwhile to step back and analyze what happens in some very simple scenarios, such as copying a value from one variable to another, or passing those variables as arguments to a method call. And that is the focus of this article. Boring? Hardly -- consider the following questions:
- Why do so few system-defined types provide "copy constructors?"
-
What does it mean to pass a reference-type parameter with C#'s
refkeyword? -
Why is
System.MarshalByRefObjecta base class, rather than an attribute (or interface)? - Does a value-type object still have pass-by-value semantics when it's boxed?
-
What's the deal with
System.String? It's a reference-type, but it seems to have pass-by-value semantics.
|
Related Reading
.NET Framework Essentials |
Throughout this article, we'll contemplate zen koans like these, and more. We'll
start slowly, with a review of value-types and reference-types in .NET. Then
we'll move on to more advanced terrain, deconstructing the System.ICloneable
interface, and even scratching the surface of .NET's remoting architecture to
explore the concept of marshalling.
Copying, Cloning, and Marshalling: Then and Now
C++ programmers saw the concept of the "well-behaved class" evolve to describe things like copy constructors, assignment operators, virtual destructors, and how these things should be applied to classes. Failure to conform to these guidelines of well-behavedness might produce compile-time errors or (far worse) run-time leaks that are terribly difficult to debug.
Thankfully, .NET languages like C# and VB.NET free us from all of this complexity. Or do they? A quick search of Google.com for "C# and 'copy constructor'" turns up quite a few developers who are a little uncertain! And rightly so -- the .NET runtime environment has its own set of rules and regulations that control an object's copying, cloning, and marshalling behavior. Some of these are outlined in Table 1. When you add the concept of boxing into the mix, .NET has (arguably) the most complex cloning and marshalling semantics of any language or runtime environment ever conceived.
Table 1: The cast of characters
System.ValueType |
Base class for value-types, which have pass-by-value semantics |
System.ICloneable |
Interface by which objects support creating clones of themselves |
Object.MemberwiseClone |
Protected method that represents the ability of all objects to duplicate themselves |
System.SerializableAttribute |
Attribute by which objects declare their support for serialization |
System.Runtime.Serialization.ISerializable |
Interface by which objects control their own serialization (and marshalling!) |
System.MarshalByRefObject |
Base class for objects that are accessed from remote app domains via proxy |
But before we look at all of these mechanisms in action, let us start at the beginning, with a quick review of .NET's distinction between value-types and reference-types.
Value-types vs. reference-types
I still remember being introduced to Java for the first time. I was attending an informal brown-bag presentation, wherein one fellow was trying to convince his coworkers (C++ programmers, the lot of us) that Java was the way of the future. "Just imagine -- no more pointers!" he said. I'd only just seen a glimpse of the language, but it sure looked a lot to me like everything was a pointer. And, sure enough, that turned out to be pretty much the case. Java brought fantastic productivity gains, but at the cost of terrible performance for a lot of applications (mainly due to its excessive use of the heap and incessant dereferencing of pointers).
The architects of .NET attempted to learn from Java's mistakes in this regard by
creating a framework-wide distinction between reference-types and value-types.
Put simply, value-types are those that derive from System.ValueType (either
directly or indirectly) and reference-types are those that do not. In C#,
value-types are declared using the
struct
or
enum
keywords, and reference-types declared with the
class
keyword. But neither of those distinctions are very helpful. The real
difference in most programmers' minds is that value-types have pass-by-value
semantics, and reference-types have pass-by-reference semantics.
The easiest way to see the difference is to write a few lines of code: make two copies of a variable, and try to modify them independently.
Listing 1: Simple copying of value- and reference-types in C#
struct MyStruct { int x; } // value-type!
class MyClass { int x; } // reference-type!
MyStruct s1 = new MyStruct(37);
MyStruct s2 = s1;
s2.x = 73;
MyClass c1 = new MyClass(37);
MyClass c2 = c1;
c2.x = 73;
Console.WriteLine("s1:{0}, s2:{1}, c1:{2}, c2:{3}",
s1,
s2,
c1,
c2);
//output: s1:37, s2:73, c1:73, c2:73
Stepping through this simple code in a debugger, you can see that the value-type
variable (MyStruct s1) is copied by-value into
s2, while the reference-type variable (MyClass c1)
is copied by-reference into
c2. So, modifying the value of
c2.x
also modifies the value of
c1.x
(because they're really the same value).
But what about boxed value-type objects? Somewhat surprisingly, the topic of
boxing does not have any real relevance here -- this is because the very act of
boxing a value-type variable involves making a memberwise copy of the variable,
from the stack onto the heap (and unboxing, vice versa). So value-type objects
are passed by value, even to destinations typed as System.Object.
(For more background on boxing, see the References section for links to Eric Gunnerson's articles on the topic.)
Passing a Variable as a Method Parameter: Value-types vs. Reference-types (Again)
For ordinary method calls (no
ref
or
out
parameters, and no marshalling -- all of which will be discussed later), passing
a variable as an argument to a method (or property) is logically equivalent to
declaring another variable of the same type, and assigning its value to the
newly-declared variable.
No surprise, there. For both value- and reference-types, a shallow copy of the variable is made. For value-types, this means a member-wise copy is created. For reference-types, only the reference is copied (resulting in two references to the same object, as we saw earlier).
However, the situation is somewhat altered if the method parameter is decorated
with either the
ref
or
out
keyword. In those cases, for value-type parameters, a pointer to the object is
passed to the method (thus allowing the method body to alter the value of the
original object). This technique is known as passing a parameter "by
reference." This should be fairly intuitive (at least to former C++
programmers, who will see it's just like passing a pointer-type parameter; or
using the [out] attribute in COM). Of course, many programming languages make
this distinction in one way or another, not just those whose names begin with
the letter "C."
But what does it mean to pass a reference-type "by reference"? Isn't the
parameter already being passed by-reference, simply by virtue of not deriving
from System.ValueType? Should we perhaps expect a compiler warning, or an
error? No -- put simply, the
ref
keyword means the same thing for reference-types as it does for value-types: a reference
to the variable is passed to the method, rather than a shallow copy. For
classes (reference-types), this means a reference to a reference. This allows
the method to discard and reallocate the caller's variable (or even set it to
null). Again, the analogy in the COM and C++ world is
passing a pointer to a pointer.
The
out
keyword has very nearly the same semantics as
ref. However, unlike
ref, the method implementation is obligated to
instantiate and initialize a new variable, for which the caller has a pointer.
Effectively, this gives
out parameters the same semantics as a property or
method's return value.
The Diminished Role of "Copy Constructors" in C#
Now that we've seen how the default variable-copying semantics in .NET work, you're all probably wondering how to override this behavior to create full, rich, deep copies of your objects (rather than squeak by with the dull, shallow copies provided by the runtime).
For example, imagine a class that represents a node in a doubly-linked list. Each node object contains a reference to the previous node, and the next node (or perhaps a null pointer, if the node is the first or last in the list). Clearly, a memberwise copy of any single node would not be desirable! Figure 1 illustrates the tragedy that would ensue, if a shallow copy of the head node were inadvertently made:
![]() |
| Figure 1: Deep vs. Shallow Copying |
Back in the days of C++, this is where so-called "copy constructors" and "assignment operator overloading" came into play. Now, it's true that in C# you can define a constructor that looks and feels very much like a C++-style copy constructor -- and several classes in the FCL do this -- but the truth is they probably shouldn't bother, because they can't overload the assignment operator.
Rather, a system-defined interface exists for classes and structs to declare to
the outside world that they support "deep copy" semantics. This interface
is System.ICloneable, and it has a single method: Clone. It doesn't get
any simpler.


