The Issue with Inheritance

18 Dec 2017

I've written mostly web software my entire career so far, and one thing that web software does a lot of is shuffle data back and forth between the database and the webpage. You can have many pages that present the same essential facts in different slices and presentations. Each method of presentation needs a slightly different subset of your data and problematically, that subset may change significantly with a small change to the design of a page. For best performance you often want to pull back only the data that you need for your particular use-case. There are various ways to achieve this using ORMs, GraphQL, etc. But I'm most interested in what this does to your architecture once you've gotten that minimal data set.

One way I've seen this handled is to partially load objects. Say you have a user object like so:

public class User
{
    public Guid UserId;
    public string UserName;
    public string Email;
    public Password Password;
    public string Name;
    public string FavoriteBand;
    public string FavoritePizza;
    public Address BillingAddress;
    public Address MailingAddress;
    public DateTime Created;
    public DateTime Updated;
    public DateTime LastLogin;
}

So, if you have a requirement to be able to show a list of all of the users with a favorite pizza like '%pep%', you would write the appropriate query, but in your SELECT clause, you would only grab UserId, Name, FavoritePizza, because that's all that you need. You ship your page and everything is awesome. But then you need to add a link to their profile and somebody throws in a call to the following:

public string GetProfileUrl()
{
    return $"/users/{HttpUtility.UrlEncode(UserName)}/";
}

The problem you have now is that all the tooling your language has says this is a permissible operation, when we know that the particular query used to load this is not populating the UserName field. This is a pretty minor bug since this will just cause the link not to work which should (hopefully) be caught by somebody before this gets shipped. If this was a more sensitive data point, a user could be adversely affected by this omission. Make too many bugs like this and it's probably time to just start loading all the data all the time.

Another solution is to create a new class for this use case that accurately represents all the data pulled back from the query:

public class UserPizzaSearchResult
{
    public Guid UserId;
    public string UserName;
    public string Name;
    public string FavoritePizza;
}

The problem here is that this class can no longer call the same GetProfileUrl method as the full user object even if you refactor it into a static method. At this point you could give up, or crack on and define an interface.

public interface IUserProfile
{
    public string UserName;
}
public class User : IUserProfile
{
    /* snip */
}
public class UserPizzaSearchResult : IUserProfile
{
    /* snip */
}
public static class UserProfile
{
    public static string GetProfileUrl(this IUserProfile user)
    {
        return $"/users/{HttpUtility.UrlEncode(user.UserName)}/"
    }
}

Now, the cool thing about this is that C# lets us define this static method to use the same method call syntax as regular methods by prefixing the first argument with the keyword this. The problem with this is that its a giant nightmare. Nobody really wants to define a separate interface for each method depending on the methods that it needs to call. Especially since it's easy to look at our previous option and just tell the hapless programmer to git gud at programming. But I want to live in a world where a programmer can be able to use static typing to help them reason about code without sacrificing maintainability.

In dynamic languages (or languages with dynamic support) this actually gets a bit easier. I could define this method like so:

public static class UserProfile
{
    public static string GetProfileUrlDynamic(dynamic user)
    {
        return $"/users/{HttpUtility.UrlEncode(user.UserName)}/"
    }
    public static string GetProfileUrl(this User user)
    {
        return GetProfileUrlDynamic((dynamic) user);
    }
}

Now, in C# this precludes you from using the method call syntax on dynamic expressions, so I've included an pseudo-overload (but not an actual overload, or you'll blow the stack) to enable that syntax. It's a little wonky because you have to define a copy of the method for each type you want to use the function with, but the big problem is that dynamic code incurs overhead for property lookup. Ideally we'd have something that could fit the language a little better.

Each of these solutions are more-or-less ergonomic in their respective languages, but I would love to be able to do something like the following:

public static class UserProfile
{
    public static string GetProfileUrl<T>(this T user)
        where T : { string UserName { get; } }
    {
        return $"/users/{HttpUtility.UrlEncode(user.UserName)}/"
    }
}

or even better:

public static class UserProfile
{
    // `user` is inferred to be either { string UserName { get; } } or
    // { byte[] UserName { get; } } by the compiler. This is likely
    // infeasable because of the explosion of types when conversions and
    // overloads are considered.
    public static string GetProfileUrl(this any user)
    {
        return $"/users/{HttpUtility.UrlEncode(user.UserName)}/"
    }
}

And have the compiler "Do The Right Thing"™, but alas, this is extremely unlikely. But still, a guy can dream, right?