How to create a deep copy of an object

How to create a deep copy of an object

Introduction

There might be some situations where we want to create an exact copy of a complex object graph. Like other coding languages C# does't provide any out of the box solution for this. We therefor have to come up with our own one.

When it comes to cloning we first have to understand the difference between a value type and a reference type. Also we need to distinguish between a shallow copy and a deep copy.

Value & reference types

I wont go into the details of which are stored on the stack and which on the heap, as I think it's more conductive to think of the differences between value types and reference types in terms of their semantic rather their implementation details.

A variable of a value type directly holds a value of its type. If you assign it to another variable, the value is copied directly and both variables work independently. That is, they are always copied by value.

A variable of a reference type in contrast doesn't store its value directly, but a reference (pointer/address) to a location where the value is stored. Because reference types represent the address of the variable rather than the data itself, assigning a reference variable to another doesn't copy the object it is pointing to. Instead it creates a second copy of the reference, which points to the same object as the original value.

So to copy a value type its enough to assign it to a new variable of the same type. However to copy a reference type we have to allocate memory (read: the GC will have to), copy the content and take care of further references this reference type may point to. That's why the term object graph is used.

For the sake of completeness here is a list of the built-in types:

Value types Reference types Comments
bool class
byte, char, int, long interface
sbyte, short delegate
uint, ulong, ushort dynamic
decimal, double, float object
enum string behaves like a value type
struct array even if their elements are value types!

Shallow & deep copy

When creating a shallow copy of an object graph the value and reference fields are getting copied to a new object. As a reference is a memory address, only the address is getting copied but not the actual data it is pointing to. This implies that a change to one of this reference fields will also affect all (shallow) copies of the original object.

A deep copy in contrast is independent from the original object. All containing reference types within an object graph are getting copied and are therefor pointing to its own memory addresses holding a real copy of the original data, which therefor can be altered without affecting the orignal one.

I hope I haven't bored the hell out of you so far. Now lets have a look at some possible solutions.

Example object graph

In the spirit of the DRY principle this is an example object graph I will refer to throughout this post. I left aside a complete list of all value types like enums as they behave the same in this context.

public delegate void Handler(string m);

[Serializable]
public class Course
{
    public Course(string title)
    {
        this.Title = title;
    }

    public string Title { get; set; }
}

[Serializable]
public class Address
{
    public int Housenumber { get; set; }
    public string Street { get; set; }
}

[Serializable]
class Student
{
    private readonly double Pi = 3.14159;
    public const int UniversalAnswer = 42;
    public Handler Handler { get; set; }
    public Action<string> OutputHandler { get; set; }
    public Course[] Subscriptions { get; set; }
    public double[] Grades { get; set; }
    public int Age { get; set; }
    public string Name { get; set; }
    public Address Address { get; set; }

    public void PrintSummary(Action<string> action)
    {
        var message = $"Name: {this.Name} Age: {this.Age}";

        this.OutputHandler?.Invoke(message);
    }
}

Serialize and deserialize your instance

Consider the following generic extension method, which uses the BinaryFormatter class. This serializes and deserializes an object, or an entire graph of connected objects in binary format.

The class provides two relevant methods:

  • public void Serialize (System.IO.Stream serializationStream, object graph);
  • public object Deserialize (System.IO.Stream serializationStream);

As you can see from the signatures both methods require an instance of type System.IO.Stream to work and last but not least our object we would like to copy. For our purpose the MemoryStream class comes in handy. It inherits from System.IO.Stream and creates a stream whose backing store is in memory. This is just perfect as we don't want to write to disc or anything else. In order to deserialize from the stream just after the serialization part we need to set the streams position to the beginning position. This is done by memoryStream.Position = 0.

public static class DeepCloneExtensions
{
    public static T DeepCloneByStream<T>(this T obj)
    {
        using (var memoryStream = new MemoryStream())
        {
            var formatter = new BinaryFormatter();
            formatter.Serialize(memoryStream, obj);

            memoryStream.Position = 0;

            return (T)formatter.Deserialize(memoryStream);
        }
    }
}

Now lets have a look at an usage example.

[Fact]
public void Should_Clone_By_Stream()
{
    // Arrange
    var student = new Student
    {
        Age = 30,
        Name = "Bommelmaier",
        Address = new Address { Street = "Baker", Housenumber = 12 },
        Grades = new[] { 1.0, 2.0 },
        Subscriptions = new[] { new Course("Mathematics") },
        OutputHandler = OutputToDebug
    };

    // Act
    var copycat = student.DeepCloneByStream();

    student.Age = 40;
    student.Name = "Smith";
    student.Address.Housenumber = 99;
    student.Grades[0] = 3.0;
    student.Subscriptions[0] = new Course("Ethics");
    student.OutputHandler = OutputToConsole;

    // Assert
    copycat.Age.Should().NotBe(student.Age);
    copycat.Name.Should().NotBe(student.Name);

    copycat.Address.Should().NotBeSameAs(student.Address);
    copycat.Grades.Should().NotBeSameAs(student.Grades);
    copycat.Subscriptions.Should().NotBeSameAs(student.Subscriptions);
    copycat.OutputHandler.Should().NotBeSameAs(student.OutputHandler);
}

private void OutputToDebug(string m)
{
    Debug.WriteLine(m);
}

private void OutputToConsole(string m)
{
    Console.WriteLine(m);
}

This actually works, but faces a few drawbacks...

First it requires every class in the object graph to be marked as [Serializable]. Only marking the top most class as such isn't enough and will result in an System.Runtime.Serialization.SerializationException. This is no big deal in this simple example, but will become unpractical if not impossible if you want to clone a reference type which is out of your control.

Secondly the serialization of our delegate property named Handler won't work on a project targeting .Net Core and will result in a SerializationException saying Serializing delegates is not supported on this platform.. The same applies for the generic Action<string> OutputHandler delegate.

That would have been too easy. So lets look at other possible technics.

Using reflection and recursion

Now lets have a look at more complex approach. The following extension method uses reflection and recursion to walk through an object graph. Remeber when I said that semantic is more important then implementation details? This is especially true for System.String. It is a reference type but behaves like a value type. That's why we can handle all value types and string the same way. However we need to take special care for delegates and arrays, where I am using ICloneable.Clone() to create copies.

public static public static class DeepCloneExtensions
{
    public static T DeepCloneByReflection<T>(this T source)
    {
        var type = source.GetType();

        var target = Activator.CreateInstance(type);

        foreach (var propertyInfo in type.GetProperties())
        {
            // Handle value types and string
            if (propertyInfo.PropertyType.IsValueType ||
            propertyInfo.PropertyType == typeof(string))
            {
                propertyInfo.SetValue(target, propertyInfo.GetValue(source));
            }
            // Handle delegates
            else if (propertyInfo.PropertyType.IsSubclassOf(typeof(Delegate)))
            {
                var value = (Delegate)propertyInfo.GetValue(source);

                if (value != null)
                {
                    propertyInfo.SetValue(target, value.Clone());
                }
            }
            // Handle arrays
            else if (propertyInfo.PropertyType.IsSubclassOf(typeof(Array)))
            {
                var value = (Array)propertyInfo.GetValue(source);

                if (value != null)
                {
                    propertyInfo.SetValue(target, value.Clone());
                }
            }
            // Handle objects
            else
            {
                var value = propertyInfo.GetValue(source);

                if (value != null)
                {
                    propertyInfo.SetValue(target, value.DeepCloneByReflection());
                }
            }
        }

        return (T)target;
    }
}

This seems the best approach so far, as we don't have to declare all our classes as Serializable. Also we are able to handle (multicast-) delegates.

But as always there is still room for improvement. The recursion loop will produce a StackoverflowException as a result of an infinite loop in case our source graph contains a circular reference. Secondly the method can't properly handle variables of type dynamic.

I hope this was somehow helpful and will leave an insight into MemberwiseClone() and the ICloneable interface for another post.

Best, Matthias