Advanced C# Generics

14 Aug 2014

Most of us programmers that use C# end up using generics every day in the form of standard library lists, map/dictionaries, or arrays. Most people, however seem to have a very poor understanding of how generics work or how they can solve many of your code re-use problems.

ValueType

One of the odd corners of C# generics is how the type system works for structs and enums. After letting ReSharper loose on a code base that I work on, it was amazing to see how often the generic code did not handle ValueTypes correctly. To give a quick overview, let me show you some example code.

bool IsNull<T>(T t)
{
    return t == null;
}

This code looks harmless, and in Java, it would be completely acceptable, but here is the emitted IL assembly code:

IsNull:
IL_0000:  nop         
IL_0001:  ldarg.1     
IL_0002:  box         01 00 00 1B 
IL_0007:  ldnull      
IL_0008:  ceq         
IL_000A:  stloc.0     // CS$1$0000
IL_000B:  br.s        IL_000D
IL_000D:  ldloc.0     // CS$1$0000
IL_000E:  ret  

If you don't understand IL, it works out to something like this:

class Box<T>
{
    public Box(T value) { Value = value; }

    public T Value { get; set; }
}

object box<T>(T t)
{
    if (t is ValueType)
    {
        return new Box<T>(t);
    }
    return t;
}

bool IsNull<T>(T t)
{
    return box(t) == null;
}

The compiler inserts a box opcode, why is that? Well, the simple answer is that we did not consider ValueTypes. ValueType is the wild card in the C# type system, it is technically an object, but it isn't a heap managed class. Because the on-stack representation of a ValueType is the whole value and not a pointer, we have to box a value type before we can compare it to null. This means that the function will always return false for structs and enums, but the cost is an extra box operation.

The fixed version of this function looks like this:

bool IsNull<T>(T t)
    where T : class
{
    return t == null;
}

This emits the same IL, but now we will never use it on a value that is not already a heap managed class. In that context box is always a noop, so we don't have to worry about any weird usage.

Covariance and Contravariance

Because C# has sub-typing, we know that we can use objects in different contexts depending on the type hierarchy. We can pass an object into a method as long as the type of the parameter is less specific than the type of the object, and we can return an object as long as the type of the object is more specific than the return type. These two concepts are called covariance and contravariance.

class Base
{
}
class Derived: Base
{
}

void CoVariant(Base b)
{
}
object ConVariant()
{
    return new Base();
}

void Main()
{
    CoVariant(new Derived());
    var o = ConVariant();
}

While most people understand and can use covariance and contravariance when it comes to generics, the rule is broken by default. Because a class can use its generic type as either a return type or a parameter, the type is fixed and type- checking will not allow any flexibility.

class NoVariant<T>
{
    T Method(T t) { return t; }
}

void Main()
{
    var obj = new NoVariant<Base>();
    
    Console.WriteLine("obj is NoVariant<object>  = {0}", obj is NoVariant<object>);
    Console.WriteLine("obj is NoVariant<Base>    = {0}", obj is NoVariant<Base>);
    Console.WriteLine("obj is NoVariant<Derived> = {0}", obj is NoVariant<Derived>);
}

Results:

obj is NoVariant<object>  = False
obj is NoVariant<Base>    = True
obj is NoVariant<Derived> = False

See, even though you might intuitively think that you could use that NoVariant class in one of those contexts, the type system will not let you. The way to get that flexibility back into the type of your objects is to use covariant and contravariant interfaces. To allow covariance, you have to mark the type parameter with the out keyword (to allow it to return "out" of a method). To allow contravariance, you have to mark the type parameter with the in keyword (to allow it to take a parameter "in" to a function).

interface CoVariant<out T>
{
    T Method();
}

interface ConVariant<in T>
{
    void Method(T t);
}

class Uber<T>: NoVariant<T>, CoVariant<T>, ConVariant<T>
{
    T    CoVariant<T> .Method()    { return default(T); }
    void ConVariant<T>.Method(T t) { }
}

void Main()
{
    var obj = new Uber<Base>();
    
    Console.WriteLine("obj is CoVariant<object>   = {0}", obj is CoVariant<object>);
    Console.WriteLine("obj is ConVariant<object>  = {0}", obj is ConVariant<object>);
    Console.WriteLine("obj is CoVariant<Base>     = {0}", obj is CoVariant<Base>);
    Console.WriteLine("obj is ConVariant<Base>    = {0}", obj is ConVariant<Base>);
    Console.WriteLine("obj is CoVariant<Derived>  = {0}", obj is CoVariant<Derived>);
    Console.WriteLine("obj is ConVariant<Derived> = {0}", obj is ConVariant<Derived>);
}

Results:

obj is CoVariant <object>  = True
obj is ConVariant<object>  = False
obj is CoVariant <Base>    = True
obj is ConVariant<Base>    = True
obj is CoVariant <Derived> = False
obj is ConVariant<Derived> = True

The last thing to note about these covariant and contravariant interfaces is that by choosing one or the other, you lose the ability to use the type parameter in certain ways. When you have an in type, it's a compile error to try to return it from a method, and conversely if you use an out type you lose the ability to take it as a method parameter.