Tooling in the .net World

As a refugee Java developer first washing up on the shores of .net land, I was pretty surprised at the poor state of the tools and libraries that seemed to be in common use. I couldn’t believe this shanty town was the state of the art; but with a spring in my step I marched inland. A few years down the road, I can honestly say it really isn’t that great – there are some high points, but there are a lot of low points. I hadn’t realised at the time but as a Java developer I’d been incredibly spoiled by a vivacious open source community producing loads of great software. In the .net world there seem to be two ways of solving any problem:

  1. The Microsoft way
  2. The other way*

*Note: may not work, probably costs money.

IDE

The obvious place to start is the IDE. I’m sorry, but Visual Studio just isn’t good enough. Compared to Eclipse and IntelliJ for day-to-day development Visual Studio is a poor imitation. Worse, until recently, it was expensive as a lone developer. While there were community editions of Visual Studio, these were even more pared down than the unusable professional edition. Frankly, if you can’t run ReSharper, it doesn’t count as an IDE.

I hear there are actually people out there who use Visual Studio without ReSharper. What are these people? Sadists? What do they do? I hope it isn’t writing software. Perhaps this explains the poor state of so much software; with tools this bad – is it any wonder?

But finally Microsoft have seen sense and recently the free version of Visual Studio supports plugins – which means you can use ReSharper. You still have to pay for it, but with their recent licensing changes I don’t feel quite so bad handing over a few quid a month renting ReSharper before I commit to buying an outright license. Obviously the situation for companies is different – it’s much easier for a company to justify spending the money on Visual Studio licenses and ReSharper licenses.

With ReSharper Visual Studio is at least closer to Eclipse or IntelliJ. It still falls short, but there is clearly only so much lipstick that JetBrains can put on that particular pig. I do however have great hope for Project Rider, basically IntelliJ for .net.

Restful Services

A few years back another Java refugee and I started trying to write a RESTful, CQRS-style web service in C#. We’d done the same in a previous company on the Java stack and expected our choices to be similarly varied. But instead of a vast plethora of approaches from basic HTTP listeners to servlet containers to full blown app servers we narrowed the field down to two choices:

  1. Microsoft’s WCF
  2. ServiceStack

My fellow developer played with WCF and decided he couldn’t make it easily fit the RESTful, CQRS style he had in mind. After playing with ServiceStack we found it could. But then begins a long, tortuous process of finding all the things ServiceStack hadn’t quite got right; all the things we didn’t quite agree with. Now, this is not entirely uncommon in any technology selection process. But we’d already quickly narrowed the field to two. We were now committed to a technology that at every turn was causing us new, unanticipated problems (most annoyingly, problems that other dev teams weren’t having with vanilla WCF services!)

We joked (not really joking) that it would be simpler to write our own service layer on top of the basic HTTP support in .net. In hindsight, it probably would have been. But really, we’d paid the price for having the temerity to step off the One True Microsoft Way.

Worse, what had started as an open source project went paid for during the development of our service – which meant that our brand new web service was immediately legacy as the service layer it was built on was no longer officially supported.

Dependency Management

The thing I found most surprising on arrival in .net land was that people were checking all their third party binaries into version control like it was the 1990s! Java developers might complain that Maven is nothing more than a DSL for downloading the internet. But you know what, it’s bloody good at downloading the internet. Instead, I had the internet checked in to version control.

However, salvation was at hand with NuGet. Except, NuGet really is a bit crap. NuGet doesn’t so much manage my dependencies as break them. All the damned time. Should we restrict all versions of a library (say log4net) to one version across the solution? Nah, let’s have a few variations. Oh, but nuget, now I get random runtime errors because of method signature mis-matches. But it doesn’t make the build fail? Brilliant, thank you, nuget.

So at my current place we’ve written a pre-build script to check that nuget hasn’t screwed the dependencies up. This pre-build check fails more often than I would like to believe.

So managing a coherent set of dependencies isn’t the job of the dependency tool, so what does it do? It downloads one file at a time from the internet. Well done. I think wget can do that, too. It’s a damned sight faster than the nuget power shell console, too. Nuget: it might break your builds, but at least it’s slow.

The Microsoft Way

And then I find things that blow me away. At my current place we’ve written some add-ins to Excel so that people who live and die by Excel can interact with our software. This is pretty cool: adding buttons and menus and ribbons into Excel, all integrated to my back-end services.

In my life as a Java developer I can never even imagine attempting this. The hoops to jump through would have been far too numerous. But in Visual Studio? Create a new, specific type of solution, hit F5, Excel opens. Set a breakpoint, Excel stops. Oh my God – this is an integrated development environment.

Obviously our data is all stored in Microsoft SQLServer. This also has brilliant integration with Visual Studio. For example, we experimented with creating a .net assembly to read some of the binary data we’re storing in the DB. This way we could run efficient queries on complex data types directly in the DB. The dev process for this? Similarly awesome: deploy to the DB and run directly from within Visual Studio. Holy integrated dev cycle, batman!

When there is a Microsoft way, this is why it is so compelling. Whatever they do will be brilliantly integrated with everything else they do. It might not have the flexibility you want, it might not have all the features you want. But it will be spectacularly well integrated with everything else you’re already using.

Why?

Why does it have to be this way? C# is really awesome language; well designed with a lot of expressive power. But the open source ecosystem around it is barren. If the Microsoft Way doesn’t fit the bill, you are often completely stuck.

I think it comes down to the history of Visual Studio being so expensive. Even as a C# developer by day, I am not spending the thick end of £1,000 to get a matching dev environment at home, so I can play. Even if I had a solid idea of an open source project to start, I’m not going to invest a thousand quid just to see.

But finally Microsoft seem to be catching on to the open source thing. Free versions of Visual Studio and projects like CoreCLR can only help. But the Java ecosystem has a decade head start and this ultimately creates a network effect: it’s hard to write good open source software for .net because there’s so little good open source tooling for .net.

Advertisements

Rich domain objects with DivineInject

DivineInject is a .net dependency injection framework, designed to be simple to use and easy to understand.

You can find the source for DivineInject on github. In this second part in the series we’ll look at creating rich domain objects, the first part covers getting started with Divine Inject.

Simple Domain Objects

As an example, imagine I have a simple shopping basket in my application. The shopping basket is encapsulated by a Basket object, for which the interface looks like this:

interface IBasket
{
    IList<IBasketItem> GetBasketContents();
    void AddItemToBasket(IBasketItem item);
}

The contents of the basket are actually backed by a service to provide persistence:

interface IBasketService
{
    IList<IBasketItem> GetBasketContents(Guid basketId);
    void AddItemToBasket(Guid basketId, IBasketItem item);
}

I can create a very simple implementation of my Basket:

class Basket : IBasket
{
    private readonly IBasketService _basketService;
    private readonly Guid _basketId = Guid.NewGuid();

    public Basket(IBasketService basketService)
    {
        _basketService = basketService;
    }

    public IList<IBasketItem> GetBasketContents()
    {
        return _basketService.GetBasketContents(_basketId);
    }

    public void AddItemToBasket(IBasketItem item)
    {
        _basketService.AddItemToBasket(_basketId, item);
    }
}

Since I’m using dependency injection, IBasketService sounds like a dependency, so how do I go about creating an instance of Basket? I don’t want to create it myself, I need the DI framework to create it for me, passing in dependencies.

I want to do something with how the Basket is created, so let’s start with a simple interface for creating baskets:

interface IBasketFactory
{
    IBasket Create();
}

When I’m creating a basket I don’t care about IBasketService or other dependencies; calling code just wants to be able to create a new, empty basket on demand. How would we implement this interface? Well, I could do the following – although I shouldn’t.

class BadBasketFactory : IBasketFactory
{
    public IBasket Create()
    {
        // DON'T DO THIS - just an example
        return new Basket(
            DivineInjector.Current.Get<IBasketService>());
    }
}

Now I’d never suggest actually doing this, explicitly calling the dependency injector. The last thing I want from my DI framework is to have references to it smeared all over the application. However, what this class does is basically what I want to happen.

DivineInject however can generate a class like this for you; this is configured at the same time you define the rest of your bindings:

DivineInjector.Current
    .Bind<IBasketFactory>().AsGeneratedFactoryFor<Basket>();

This generates an IBasketFactory implementation, which can create new IBasket implementations on demand (they will all actually be instances of Basket); all without having references to the DI framework smeared across my code. If I want to use the IBasketFactory, for example from my Session class, I declare it as a constructor arg the same as I would any other dependency:

public Session(IAuthenticationService authService, 
               IBasketFactory basketFactory)
{
    _authService = authService;
    _basketFactory = basketFactory;
}

The DI framework takes care of sorting out dependencies and I get a Session class with no references to DivineInject. When I need a new basket I just call _basketFactory.Create(). Since I have nice interfaces everywhere, everything is easy to mock so I can TDD everything.

Rich Domain Objects

Now what happens as my domain object becomes more complex? Say, for example, I want to be able to pass in some extra arguments to my constructor. Returning to our basket example: as well as starting a new, empty basket – isn’t there a possibility that I want to continue using an existing basket? E.g. in case of load balanced servers or fail-over. What do I do then?

I start by changing Basket, to allow me to pass in an existing basket id:

private readonly IBasketService _basketService;
private readonly Guid _basketId;

public Basket(IBasketService basketService)
{
    _basketService = basketService;
    _basketId = Guid.NewGuid();
}

public Basket(IBasketService basketService, Guid basketId)
{
    _basketService = basketService;
    _basketId = basketId;
}

I now have two constructors, one of which accepts a basket id. Since all Basket instances are created by an IBasketFactory, I need to change the factory interface, too:

interface IBasketFactory
{
    IBasket Create();
    IBasket UseExisting(Guid id);
}

I now have a new method on my IBasketFactory, if I was hand-coding the factory class I’d expect this second method to call the second constructor, passing in the basket id.

What do we need to tell DivineInject to make it generate this more complex IBasketFactory implementation? Nothing! That’s right, absolutely nothing – DivineInject will already generate a suitable IBasketFactory. Our original declaration above, is still sufficient:

DivineInjector.Current
    .Bind<IBasketFactory>().AsGeneratedFactoryFor<Basket>();

This generates an IBasketFactory implementation, returning a Basket instance for each method it finds on the interface. Since one of these methods takes a Guid, it tries to find a matching constructor which also takes a Guid, plus any dependencies it knows about. DivineInject can automatically wire up the right factory method to the right constructor, using the arguments it finds in each. Now, when a session wants to re-use an existing basket it just calls:

_basketFactory.UseExisting(existingBasketId)

This creates a new Basket instance, with dependencies wired up, passing in the basket id. Everything is still using interfaces so all your collaborations can be unit tested. Behind the scenes DivineInject generates the IL code to implement your factory interfaces, leaving you free to worry about your design.

By following this pattern we can create rich domain objects that include both state and dependencies: it becomes possible to create stateful objects that have behaviours (methods) that make sense in the domain. Successfully modelling your domain is critical to creating code that’s easy to understand and easy to change. DivineInject helps you model your domain better.

Getting started with DivineInject

DivineInject is a .net dependency injection framework, designed to be simple to use and easy to understand.

You can find the source for DivineInject on github.

Why another DI framework?

Because dependency injection is important – but done wrong it can do more harm than good. DivineInject is opinionated about the right way to use dependency injection:

  • Constructor injection or death
    Setter injection is bad for your health, so just say no
  • Dependencies are singletons
    Dependencies are external to your application – your DI framework doesn’t need to know about users or sessions or threads.
  • Domain objects can be rich, too
    Your domain model doesn’t have to be anemic

Constructor Injection

Setter and method injection are much harder to get right – so DivineInject simply doesn’t support them. If you can’t implement your dependencies as constructor arguments, then maybe you should refactor the dependency so you can.

Singleton Dependencies

Dependencies are external to your application. They are external things that your application depends on like databases and web services; these things are generally used across your application. Hint: if a dependency is only used in one or two places, it isn’t an application-wide dependency.

Things like users and sessions are domain concepts in your domain, not in the domain of dependency injectors. All dependency injection frameworks get wrapped up in different scopes, which makes the frameworks harder to use. DivineInject simply doesn’t support them — if you need something user-scoped or session-scoped, then implement the logic yourself. It isn’t hard, and if you ever want to understand the lifecycle of your objects it’s in your code, not mine — which will make reasoning about your code or debugging it a million times easier.

Rich Domain Objects

DivineInject borrows an idea from Google Guice – with Guice it is called “assisted injection”, in DivineInject we call it generated factory injection. The idea is the same — providing a simple way to create objects with constructors which accept runtime arguments as well as dependencies to inject. This allows you to create rich, stateful domain objects which also have dependencies.

Getting Started

So how do you get started with DivineInject? I’ll assume you’ve not been living under a rock and already know what dependency injection is. In which case basic usage of DivineInject boils down to three steps:

  • Add the dependency
  • Configure bindings
  • Create your root object

Add the Dependency

DivineInject is available as a NuGet package:

Install-Package DivineInject

Configure Bindings

When DivineInject creates a new instance of a class it calls a constructor, each of the constructor arguments is a dependency of the class — something external to the class, for which an implementation must be provided. But how do we know which value to pass for each dependency? This is controlled by the bindings.

Your bindings must be configured near the start of your application — e.g. in the main method or global.asax.

There are basically two ways to configure bindings with DivineInject.

1) bind an interface to a concrete type. DivineInject will pass the same (singleton) instance of the given concrete type whenever it encounters a constructor argument of the interface type.

DivineInjector.Current
	.Bind<IOrdersService>().To<OrdersService>();

2) bind an interface to a specific instance. DivineInject will pass the given instance whenever it encounters a constructor argument of the interface type.

var myOrdersService = new OrdersService(...);
DivineInjector.Current
	.Bind<IOrdersService>().ToInstance(myOrdersService);

Create Your Root Object

DivineInject allows you to create a tree of objects — each object has references to dependencies, which in turn reference their own dependencies; forming a tree of objects. This tree is created starting with the root object — e.g. in a WPF application the root would be the outermost ViewModel; in a WCF application it would be the service class.

The root object is created by calling DivineInject — any arguments the root object constructor requires are taken from the bindings. This should be the only time you explicitly call DivineInject to create objects.

There are two ways of creating the root object:

1) by explicit type:

private MainWindowViewModel CreateViewModel()
{
    return DivineInjector.Current.Get<MainWindowViewModel>();
}

2) by passing the type as an argument:

public object GetInstance(InstanceContext instanceContext, 
                          Message message)
{
    return DivineInjector.Current.Get(_instanceType);
}

At this point you now have a very simple set of dependencies configured, which can be wired up by DivineInject. E.g.

public class MainWindowViewModel
{
    public MainWindowViewModel(IOrdersService ordersService)
    {
        ...
    }
}

I can define classes which accept an instance of an interface as a constructor argument, when DivineInject instantiates this class it will pass in the right implementation of the interface, the one configured by the bindings.

In the second part of this series, we’ll look at how we use DivineInject to create classes that aren’t singletons.

MVVM and Threading

The Model-View-ViewModel pattern is a very powerful design pattern when building WPF applications, even if I’m not sure everyone interprets it the same way. However, it’s never been clear to me how to easily manage multi-threaded WPF applications: writing multi-threaded code is hard and there seems to be no real support baked into WPF or the idea of MVVM to make multi-threaded code easier to get right.

The Problem

All WPF applications are effectively bound by the same three constraints:

  1. All interaction with UI elements must be done on the UI thread
  2. All long-running tasks (web service calls etc) should be done on a background thread to keep the UI responsive
  3. Repeatedly switching between threads is expensive

Bound by these constraints it means that all WPF code has two types of thread running through it: UI threads and background threads. It is clearly important to know which type of thread will be executing any given line of code: a background thread cannot interact with UI elements and UI threads should not make expensive calls.

Example

A very brief, contrived example might help. The source code for this is available on github.

Here is a viewmodel for a very simple view:

class ViewModel
{
  public ObservableCollection<string> Items { get; private set; }
  public ICommand FetchCommand { get; private set; }

  public async void Fetch()
  {
    var items = await m_model.Fetch();
    foreach (var item in items)
      Items.Add(item);
  }
}

The ViewModel exposes a command, which calls ViewModel.Fetch() to retrieve some data from the model; once retrieved this data is added to the list of displayed items. Since Fetch is called by an ICommand and interacts with an ObservableCollection (bound to the view) it clearly runs on the UI thread.

Our model is then responsible for fetching the data requested by the viewmodel:

class Model
{
  public async Task<IList<string>> Fetch()
  {
    return await Task.Run(() => DoFetch());
  }

  private IList<string> DoFetch()
  {
    Thread.Sleep(1000);
    return new[] { "Hello" };
  }
}

In a real application DoFetch() would obviously call a database or web service or whatever is required; it is also probable that the Fetch() method might be more complex, e.g. coordinating multiple service calls or managing caching or other application logic.

There is a difference in threading in the model compared to the viewmodel: the Fetch method will, in our example, be called on the UI thread whereas DoFetch will be called on a background thread. Here we have a class through which may run two separate types of thread.

In this very simple example which thread type calls each method is obvious. But scale this up to a real application with dozens of classes and many methods and it becomes less obvious. It suddenly becomes very easy to add a long-running service call to a method which runs on the UI thread; or a switch to a background thread from a method that already runs on a background thread. These errors can be difficult to spot: neither causes an obvious failure. The first will only show if the service call is obviously slow, the observed behaviour may simply be a UI that intermittently pauses for no apparent reason. The second simply slows tasks down, the application will seem slower than it ought to be with no obvious indication of why.

It seems as though WPF and MVVM have carefully guided us into a multi-threaded minefield.

First Approach

The first idea is to simply apply a naming convention, each method is suffixed with _UI or _Worker to indicate which type of thread invoked it. E.g. our model class becomes:

class Model
{
  public async Task<IList<string>> Fetch_UI()
  {
    return await Task.Run(() => Fetch_Worker());
  }

  private IList<string> Fetch_Worker()
  {
    Thread.Sleep(1000);
    return new[] { "Hello" };
  }
}

This at least makes it obvious to my future self which type of thread I think executes each method. Simple code inspection shows that a _UI method calling someWebService.Invoke(…) is a Bad Thing. Similarly, Task.Run(…) in a _Worker method is obviously suspect. However, it looks ugly and isn’t guaranteed to be correct – it is just a naming convention, nothing stops me calling a _UI method from a background thread, or vice versa.

Introducing PostSharp

If you haven’t tried PostSharp, the C# AOP library, it is definitely worth a look. Even the free version is quite powerful and allows us to evolve the first idea into something more powerful.

PostSharp allows you to create an attribute which will introduce “advice” (i.e. code to run) around any method annotated with the attribute. For example, we can annotate the ViewModel constructor with a new UIThreadPolicy attribute:

  [UIThreadPolicy]
  public ViewModel()
  {
    Items = new ObservableCollection<string>();
    FetchCommand = new FetchCommand(this);
  }

This attribute is logically similar to using a suffix on the method name, in that it documents our intention. However, by using AOP it also allows us to introduce code to run before the method executes:

class UIThreadPolicy : OnMethodBoundaryAspect
{
  public override void OnEntry(MethodExecutionArgs args)
  {
#if DEBUG
    if (Thread.CurrentThread.IsBackground)
      Console.WriteLine("*** Thread policy warning: \n" + Environment.StackTrace);
#endif
  }
}

The OnEntry method will be triggered before the annotated method is invoked. If the method is ever invoked on a background thread, we’ll output a warning to the console. In this rudimentary way, we not only document our intention that the ViewModel should only be created on a UI thread, we also enforce it at runtime to ensure that it remains correct.

We can define another attribute, WorkerThreadPolicy, to enforce the reverse: that a method is only invoked on a background thread. With discipline one of these two attributes can be applied to every method. This makes it trivial when making changes in months to come to know whether a given method runs on the UI thread or a background thread.

Locking

Understanding situations where multiple threads access code is hard, so wouldn’t it be great if we could easily identify situations where it’s safe to ignore it?

By using the thread attributes, we can identify situations where code is only accessed by the UI thread. In this case, we have no concurrency to deal with. There is exactly one UI thread so we can ignore any concurrency concerns. For example, our simple Fetch() method above can add a property to keep track of whether we’re already busy:

  [UIThreadPolicy]
  public async void Fetch()
  {
    IsFetching = true;
    try
    {
      var items = await m_model.Fetch();
      foreach (var item in items)
      Items.Add(item);
    }
    finally
    {
      IsFetching = false;
    }
  }

So long as all access to the IsFetching property is on the UI thread, there is no need to worry about locking. We can enforce this by adding attributes to the property itself, too:

  private bool m_isFetching;

  internal bool IsFetching
  {
    [UIThreadPolicy]
    get { return m_isFetching; }

    [UIThreadPolicy]
    set
    {
      m_isFetching = value;
      IsFetchingChanged(this, new EventArgs());
    }
  }

Here we use a simple, unlocked bool – knowing that the only access is from a single thread. Without these attributes, it is possible that someone in the future writes to IsFetching from a background thread. It will generally work – but access from the UI thread could continue to see a stale value for a short period.

Conclusion

In general we have aimed for a pattern where ViewModels are only accessed on the UI thread. Model classes, however, tend to have a mixture. And since the “model” in any non-trivial application actually includes dozens of classes this mixture of threads permeates the code. By using these attributes it is possible to understand which threads are used where without exhaustively searching up call stacks.

Writing multi-threaded code is hard: but, with a bit of PostSharp magic, knowing which type of thread will execute any given line of code at least makes it a little bit easier.

Matlab Performance Testing

Where I work we have an unusual split: although the developers use C#, we also work with engineers who use Matlab. This allows the engineers to work in a simple, mathematics friendly environment where working with volumes of data in matrices is normal. Whereas the developers work with real computer code in a proper language. We get to look after all the data lifting, creating a compelling user interface and all the requisite plumbing. Overall it creates a good split. Algorithms are better developed in a language that makes it easy to prototype and express complex algorithms; while dealing with databases, networks and users is best done in a proper object oriented language with support for a compile time checked type system.

As part of a recent project for the first time we had a lot of data flowing through Matlab. This led to us have to verify that our matlab code, and our integration with it from C#, was going to perform in a reasonable time. Once we had a basic working integrated system, we began performance testing. The initial results weren’t great: we were feeding the system with data at something approximating a realistic rate (call it 100 blocks/second). But the code wasn’t executing fast enough, we were falling behind at a rate of about 3 blocks/second. This was all getting buffered in memory, so it was manageable but creates a memory impact over time.

So we began scaling back the load: we tried putting 10 blocks/second through. But the strangest thing: now we were only averaging 9.5 blocks/second being processed. WTF? So we scaled back further, 1 block/second. Now we were only processing 0.9 blocks / second. What in the hell was going on?

MatlabPerf

Let’s plot time to process batches of each of the three sizes we tried (blue line), and a “typical” estimate based on linear scaling (orange line). I.e. normally you expect twice the data to take twice as long to process. But what we were seeing was nearly constant time processing. As we increased the amount of data, the time to process it didn’t seem to change?!

We checked and double checked. This was a hand-rolled performance test framework (always a bad idea, but due to some technical limitations it was a necessary compromise). We unpicked everything. It had to be a problem in our code. It had to be a problem in our measurements. There was no way this could be realistic – you can’t put 1% of the load through a piece of code and it perform at, basically, 1% of the speed. What were we missing?

Then the engineers spotted a performance tweak to optimise how memory is allocated in Matlab, which massively speeded things up. Suddenly it all started to become clear: what we were seeing was genuine, the more data we passed into Matlab, the faster it processed each item. The elapsed time to process one block of data was nearly the same as the elapsed time to process 100 blocks of data.

Partly this is because the cost of the transition into Matlab from C# isn’t cheap, these “fixed costs” don’t change depending whether you have a batch size of 1 or 100. We reckon that accounted for no more than 300ms of every 1 second of processing. But what about the rest? It seems that because of the way Matlab is designed to work on matrices of data: the elapsed time to process one matrix is roughly the same, regardless of size. No doubt this is due to Matlab’s ability to use multiple cores to process in parallel and other such jiggery pokery. But the conclusion, for me, was incredibly counter intuitive: the runtime performance of matlab code does not appreciably vary based on the size of the data set!

Then the really counter intuitive conclusion: if we can process 100 blocks of data in 500ms (twice as fast as we expect them to arrive), we could instead buffer and wait and process 1000 blocks of data in 550ms. To maximize our throughput, it actually makes sense to buffer more data so that each call processes more data. I’ve never worked with code where you could increase throughput by waiting.

Matlab: it’s weird stuff.