Where I work we have an unusual split: although the developers use C#, we also work with engineers who use Matlab. This allows the engineers to work in a simple, mathematics friendly environment where working with volumes of data in matrices is normal. Whereas the developers work with real computer code in a proper language. We get to look after all the data lifting, creating a compelling user interface and all the requisite plumbing. Overall it creates a good split. Algorithms are better developed in a language that makes it easy to prototype and express complex algorithms; while dealing with databases, networks and users is best done in a proper object oriented language with support for a compile time checked type system.
As part of a recent project for the first time we had a lot of data flowing through Matlab. This led to us have to verify that our matlab code, and our integration with it from C#, was going to perform in a reasonable time. Once we had a basic working integrated system, we began performance testing. The initial results weren’t great: we were feeding the system with data at something approximating a realistic rate (call it 100 blocks/second). But the code wasn’t executing fast enough, we were falling behind at a rate of about 3 blocks/second. This was all getting buffered in memory, so it was manageable but creates a memory impact over time.
So we began scaling back the load: we tried putting 10 blocks/second through. But the strangest thing: now we were only averaging 9.5 blocks/second being processed. WTF? So we scaled back further, 1 block/second. Now we were only processing 0.9 blocks / second. What in the hell was going on?
Let’s plot time to process batches of each of the three sizes we tried (blue line), and a “typical” estimate based on linear scaling (orange line). I.e. normally you expect twice the data to take twice as long to process. But what we were seeing was nearly constant time processing. As we increased the amount of data, the time to process it didn’t seem to change?!
We checked and double checked. This was a hand-rolled performance test framework (always a bad idea, but due to some technical limitations it was a necessary compromise). We unpicked everything. It had to be a problem in our code. It had to be a problem in our measurements. There was no way this could be realistic – you can’t put 1% of the load through a piece of code and it perform at, basically, 1% of the speed. What were we missing?
Then the engineers spotted a performance tweak to optimise how memory is allocated in Matlab, which massively speeded things up. Suddenly it all started to become clear: what we were seeing was genuine, the more data we passed into Matlab, the faster it processed each item. The elapsed time to process one block of data was nearly the same as the elapsed time to process 100 blocks of data.
Partly this is because the cost of the transition into Matlab from C# isn’t cheap, these “fixed costs” don’t change depending whether you have a batch size of 1 or 100. We reckon that accounted for no more than 300ms of every 1 second of processing. But what about the rest? It seems that because of the way Matlab is designed to work on matrices of data: the elapsed time to process one matrix is roughly the same, regardless of size. No doubt this is due to Matlab’s ability to use multiple cores to process in parallel and other such jiggery pokery. But the conclusion, for me, was incredibly counter intuitive: the runtime performance of matlab code does not appreciably vary based on the size of the data set!
Then the really counter intuitive conclusion: if we can process 100 blocks of data in 500ms (twice as fast as we expect them to arrive), we could instead buffer and wait and process 1000 blocks of data in 550ms. To maximize our throughput, it actually makes sense to buffer more data so that each call processes more data. I’ve never worked with code where you could increase throughput by waiting.
Matlab: it’s weird stuff.