We’re all average now

I’ve read enough hot takes on last week’s election result to last me a lifetime, so I don’t really want to dwell on it – but as a consequence of the result, I’ve noticed a strange blindness that, I suspect, is part of our country’s current malaise.

In the playground last week one of the Mums was complaining:

Why should someone like me on average income pay more tax?

This is an argument against Corbyn in particular and Labour in general: why should average people pay more tax? And I’d agree, asking average people to pay more tax seems unfair. I’m an unashamed socialist: I think those with more can afford to pay more to help those who have less.

Oh, I missed some background info from the quote above. Her household income is £160k a year. Yes. I didn’t typo that, you didn’t misread it: one hundred and sixty thousands pounds sterling per year. Average income. Fuck me.

I had to google UK household income – back in 2016/2017 £160k a year would put her in the top 1%. But yeah, sure, average. Whatever.

So ok, fine, you’re fantastically well paid, and you don’t want to pay more tax. I get that. I mean, you’re a selfish cunt, but I get that. I can understand people not wanting to pay more tax, because it is at least simply selfish. I got mine, get your own. I don’t agree, but I can see where the motivation comes from.

But to disingenuously claim that you shouldn’t pay more because you’re merely average. What is this fake news? Sure, we live in a nice town, where there’s quite a mixture of people – some people obviously substantially better off than others. But about half the parents in the playground are from the local council estate – I can guarantee you they are almost certainly more average than little miss but-how-will-I-afford-a-second-pony-under-Corbyn.

How blind do you have to be, how lacking in self-awareness do you have to be to describe yourself as average, when you’re so far away from average?

How average is average then? I could’ve roughly guessed, but I still find it arresting: those same stats show the 50th percentile is £23,600 a year. I mean, basically the same.

This isn’t envy. I’m well paid. I would’ve paid a chunk more tax under Corbyn, and I’m fine with that. A few hundred quid a month for a functioning NHS. Sounds like a bargain to me. Have you seen how much private medical insurance costs?!

So it’s clear to me that there’s something wrong with people, when some, clearly in the 1%, think they are average and feel so offended by the notion of paying more tax that they feel entitled to bleat about it in the playground, expecting sympathy.

Now I have no idea what’s motivated those on lower incomes to vote tory. I saw an interesting chart today though, that shows the swing to the tories correlates with the number of people on lower incomes. So clearly that has been a big factor in this election. I’ll leave it to those more qualified to speculate as to the reasons for this.

But there is something deeply wrong with our perception of our selves and what our politicians offer when people at the lower end of the income spectrum think a tory government is going to do anything other than fuck them royally; and when people at the upper end of the income spectrum think they’re average.

Is this the problem? Those on £16k a year think they’re as average as those on £160k a year? Maybe we’re all average now. Nobody wants to pay more tax. Everybody wants to blame The Others. Nobody wants to take any responsibility for addressing what’s actually broken in our country. Meanwhile, we continue lying to each other and lying to ourselves.

What is software?

What actually is software? It’s obviously not a physical thing you can point at. If I imagine a specific piece of software, where does the software stop and not-software begin?

I recently read Sapiens, a fantastic book on the history of humankind. One of the things he talks about is the “legend of Peugeot”. When we think of Peugeot the company, what do we mean? It’s not the cars they produce – the company would exist and would keep on making cars even if all the Peugeot cars in existence were scrapped overnight. It’s not the factories and offices and assembly lines, which could be rebuilt if they all suddenly burned down. It’s not the employees either – if all the employees resigned en masse, the company would hire more staff and would carry on making cars. Peugeot is a fiction – a legal fiction we all choose to believe in.

Back to software? What is software? Maybe it’s the compiled binary artefact? An executable or DLL or JAR file. But is that really what software is? Software is a living, growing, changing thing – a single binary is merely a snapshot at a given point in time.

Perhaps then software includes the source code. Without the source code, what we have is dead software – a single binary that can never (easily) be changed. Sure, we could in theory reverse engineer something resembling source code from the binary, but for any reasonably sized piece of software, would making anything beyond a trivial change be feasible, without the original source?

Even with the source code, could I just pick up, say, the source to Chrome or Excel and start hacking away? It seems unlikely – I’d need to spend time familiarising myself with the code and reading documentation. So maybe documentation is part of what makes software.

Even better than reading documentation I’d talk to other developers who are already familiar with the code – they would be able to explain it to me and answer my questions. Perhaps even more importantly, developers would be able to explain why certain things are the way they are – this tells the story of the software, the history of how it got to be the way it is. The decisions taken along the way, the mistakes made and the paths not taken.

So the knowledge developers have of how the software works and how it got there is part of what makes up software. What other knowledge makes up software? How about the process for releasing a new version? Without that knowledge modified source code is useless. To be real, live software new versions need to get into the hands of users.

When it comes to interacting with the real world, how much of that context is part of what defines the software? Look at Uber, for example – without physical cars, what use is the software? In some sense, the physical cars and their drivers are part of a software stack.

But is software even a single thing, at a single scale? Is the web front end a single piece of software? Without its backend service it is rendered useless. Does that make it a single piece of software or two?

How about as software evolves? Does it become something different, unique from the software that went before? The original version is part of the history of the software, but it isn’t distinct from it. Only if the old version was forked can we end up with a new piece of software – at this point their histories diverge. Over time they will adapt to subtly different contexts, have different dependencies, different decisions and goals: they will become two different pieces of software.

What about if software is re-written? Imagine the team responsible for the backend service decide the only solution to their technical debt problem is a re-write. So they begin re-writing it from scratch. Eventually, they switch over to the new version – the old one is archived, only kept in version control for the curious. Is this a new piece of software? Or logically just a new version? The software stack still performs the same overall purpose, there’s still only one backend service. The original, debt-laden version has become part of the history of the software: we don’t have two systems – we have one. This suggests that the actual source code is not what makes software.

If software isn’t the source code, what is it? It can’t be the team that owns it, although team members may come and go the software lives on. It isn’t the documentation either – the documentation could be re-written and the software would live on. Software is defined by its context but it is more than its context and processes.

Software is all these things and none of them. Just like Peugeot, software is a fiction we all believe in. We all pretend we know what we mean when we talk about “software”, but what actually is it?

If software is anything it is a story. It is the history of how the code got to where it is today: the decisions that were taken, the context it sits within, the components it interacts with. Documentation is an attempt to preserve this history; processes an attempt to codify lessons learned.

If software is a story, the team are the medium through which the story is kept alive. If you’ve ever seen what happens when a new team takes over legacy software, you’ll know what happens when the story dies: zombie software, not quite dead but not quite alive; still evolving and changing, but full of risk that any change could bring about disaster.

What is software? Software is a story: a story of how this got to be the way it is, whatever “this” might be.

ActiveMQ Performance Testing

We use ActiveMQ as our messaging layer – sending large volumes of messages with a need for low-latency. Generally it works fine, however in some situations we’ve seen performance problems. After spending too much time testing our infrastructure I think I’ve learned something interesting about ActiveMQ: it can be really quite slow.

Although in general messages travel over ActiveMQ without problems, we’ve noticed that when we get a burst of messages we start to see delays. It’s as though we’re hitting some message rate limit – when we burst above it messages get delayed, only being delivered at the limit. From the timestamps ActiveMQ puts onto messages we could see the broker was accepting messages quickly, but was delayed in sending to the consumer.

I setup a test harness to replicate the problem – which was easy enough. However, the throughput I measured in the test system seemed low: 2,500 messages/second. With a very simple consumer doing basically nothing there was no reason for throughput to be so low. For comparison, using our bespoke messaging layer in the exact same setup, we hit 15,000 messages/second. The second puzzle was that in production the message rate we saw was barely 250 messages/second. Why was the test system 10x faster than production?

I started trying to eliminate possibilities:

  • Concurrent load on ActiveMQ made no difference
  • Changing producer flow control settings made no difference
  • Changing consumer prefetch limit only made the behaviour worse (we write data onto non-durable topics, so the default prefetch limit is high)
  • No component seems to bandwidth or CPU limited

As an experiment I tried moving the consumer onto the same server as the broker and producer: message throughput doubled. Moving the consumer onto a server with a higher ping time: message throughput plummeted.

This led to an insight: the ActiveMQ broker was behaving exactly as though there was a limit to the amount of data it would send to a consumer “at one time”. Specifically I realised, there seemed to be a limit to the amount of unacknowledged data on the wire. If the wire is longer, it takes longer for data to arrive at the consumer and longer for the ack to come back: so the broker sends less data per second.

This behaviour highlighted our first mistake. We use Spring Integration to handle message routing on the consumer side, we upgraded Spring a year ago and one of the changes we picked up in that version bump was a change to how the message driven channel adapter acknowledges JMS messages. Previously our messages were auto-acknowledged, but now the acknowledgement mode was “transacted”. This meant our entire message handling chain had to complete before the ack was sent to the broker.

This explained why the production system (which does useful work with the messages) had a much lower data rate than the test system. It wasn’t just the 1ms ping time the message had to travel over, the consumer wouldn’t send an ack until the consumer had finished processing the message – which could take a few milliseconds more.

But much worse, transacted acknowledgement appears to prevent the consumer prefetching data at all! The throughput we see with transacted acknowledgement is one unacknowledged message on the wire at a time. If we move the consumer further away our throughput plummets. I.e. the broker does not send a new message until it has received an acknowledgement of the previous. Instead of the consumer prefetching hundreds of messages from the broker and dealing with them in turn, the broker is patiently sending one message at a time! No wonder our performance was terrible.

This was easily fixed with a spring integration config change. In the test system our message throughput went from 2,500 messages/second to 10,000 messages/second. A decent improvement.

But I was curious, do we still see the broker behaving as though there is a limit on the amount of unacknowledged data on the wire? So I moved the consumer to successively more distant servers to test. The result? Yes! the broker still limits the amount of unacknowledged data on the wire. Even with messages auto acknowledged, there is a hard cap on the amount of data the broker will send without seeing an acknowledgement.

And the size of the cap? About 64KB. Yes, in 2018, my messaging layer is limited to 64KB of data in transit at a time. This is fine when broker and consumer are super-close. But increase the ping time between consumer and broker to 10ms and our message rate drops to 5,000 messages/second. At 100ms round trip our message rate is 500 messages/second.

This behaviour feels like what the prefetch limit should control: but we were seeing significantly fewer messages (no more than sixty 1kB messages) than the prefetch limit would suggest. So far, I haven’t been able to find any confirmation of the existence of this “consumer window size”. Nor any way of particularly modifying the behaviour. Increasing the TCP socket buffer size on the consumer increased the amount of data in-flight to about 80KB, but no higher.

I’m puzzled, plenty of people use ActiveMQ, and surely someone else would have noticed a data cap like this before? But maybe most people use ActiveMQ with a very low ping time between consumer and broker and simply never notice it?

And yet, people must be using ActiveMQ in globally distributed deployments – how come nobody else sees this?

Async await in Java

Writing asynchronous code is hard. Trying to understand what asynchronous code is supposed to be doing is even harder. Promises are a common way to attempt to describe the flow of delayed-execution: first do a thing, then do another thing, in case of error do something else.

In many languages promises have become the de facto way to orchestrate asynchronous behaviour. Java 8 finally got with the program and introduced CompletableFuture; although seriously, who designed the API? It’s a mess!

The trouble with promises is  that the control flow can become anything but simple. As the control flow becomes more complex it becomes virtually impossible to understand (do this, then that, unless it’s Wednesday, in which case do this other thing, if there’s an error go back three spaces, yada yada yada).

The cool kids have moved on to using async…await. C# has it. JavaScript has it. And now… and now, via some of the big brains at EA, Java has it! Yes, Java has a usable async…await construct, without changing the language!

A simple example: we could compose a couple of asynchronous operations using CompletableFuture as follows:

private static void theOldWay() {
    doAThing()
            .thenCompose(Main::doAnotherThing)
            .thenAccept(Main::reportSuccess)
            .exceptionally(Main::reportFailure);
}

This should be pretty simple to follow, often code using futures is very far from this simple. But with the magic of EA’s async await we can re-write it like this:

private static CompletableFuture<Void> theNewWay() {
    try {
        String intermediate = await(doAThing());
        String result = await(doAnotherThing(intermediate));
        reportSuccess(result);
    } catch (Throwable t) {
        reportFailure(t);
    }
    return completedFuture(null);
}

It looks like synchronous code. But the calls to Async.await are magic. These calls are re-written (at runtime or build time, as you prefer) to make the calls non-blocking!

The code is much easier to write, much easier to read, a million times easier to debug and most importantly it scales naturally. As code becomes more complex you can use normal refactoring tools to keep it under control. With CompletableFutures you end up passing around all these future objects and, somewhere along the line, one day you’re going to miss a code path and boom! One free bug in production.

So even if you’re stuck using Java, you can still be like the cool kids and use async…await!

What craftsmanship means to me

Over a decade ago now I got my first team lead role. It was a reasonably unexpected promotion when the existing team lead left shortly after I joined. This baptism of fire introduced me to line management, but also made me question my career choice. But it was, in hindsight, the beginning of a new journey: of becoming a software craftsman.

With barely 5 years experience I was certainly no senior developer. And yet, here I had been thrust, into a team lead role. With so little experience I made many, many mistakes and was probably a pretty rubbish boss for the three other guys on the team. I tried my best. But the whole process was very draining. But worse, I started to see programming at a more abstract level. In charge of a team, I could see that all we were was a factory for turning requirements into working code. The entire process began to feel like turning a handle: feed the team requirements, some praise and a little coffee and out comes working code.

In the end, a lot of software ends up being very similar: how many CRUD apps does the world really need? Turns out billions of them. And yet, in conception, they’re not massively exciting. Take a piece of data from the user, shovel it back to the database. Take some data out of the database, show it to the user. All very pedestrian. All very repetitive. In this environment it’s easy to become disillusioned with the process of building software. A pointless handle turning exercise.

I moved on from this baptism of fire to my first proper management role. Whereas previously I was still writing code, now I was effectively a full-time manager. I was the team’s meeting and bullshit buffer. It took a lot of buffering. There was a lot of bullshit. I think we even once managed a meeting to discuss why productivity was so poor: maybe the vast number of meetings I was required to attend each day? Or could it have been the 300 emails a day that arrived in my inbox?

If I was disillusioned with the process of writing software before, I now became disillusioned with the entire industry. A large company, little more than a creche for adults, continuing forwards more out of momentum than anything else. Plenty of emails and meetings every day to stop you from having to worry too much about any of that pesky work business.

It was then that I opened my eyes and saw there was a community outside. That programmers across the world were meeting up and discussing what we do. The first thing I saw was the agile community – but even back then it already looked like a vast pyramid scheme. But I was encouraged that there was something larger happening than the dysfunctional companies I kept finding myself working for.

Then Sandro Mancuso and I started talking about software craftsmanship. He introduced me to this movement that seemed to be exactly what I thought was missing in the industry. Not the agile money-go-round, but a movement where the focus is on doing the job right; on life-long learning; on taking pride in your work.

Not long afterwards Sandro and I setup the London Software Craftsmanship Community, which quickly snowballed. It seems we weren’t alone in believing that the job can be done well, that the job should be done well. Soon hundreds of developers joined the community.

The first immediate consequence of my involvement in the software craftsmanship community was discovering a new employer: TIM Group. A company that genuinely has a focus on software built well, with pair programming and TDD. A company where you can take pride in a job done well. The most professional software organisation I’ve worked in. They’re almost certainly still hiring, so if you’re looking, you should definitely talk to them.

Finally I’d found the antidote to my disillusionment with how software is often built: the reason I was frustrated is that it was being built badly. That companies often encourage software to be built slapdash and without care, either implicitly or sometimes even explicitly. If building software feels like just turning a handle it’s because you’re not learning anything. If you’re not learning, it’s because you’re not trying to get better at the job. Don’t tell me you’re already perfect at writing software, I don’t believe it.

Through software craftsmanship I rediscovered my love of programming. My love of a job done well. The fine focus on details that has always interested me. But not just the fine details of the code itself: the fine details of how we build it. The mechanics of TDD done well, of how it should feel. I discovered that as I became more senior not only did I find I had so much more to learn, but now I could also teach others. Not only can I take pride in a job done well, but pride in helping others improve, pride in their job done well.