ActiveMQ Performance Testing

We use ActiveMQ as our messaging layer – sending large volumes of messages with a need for low-latency. Generally it works fine, however in some situations we’ve seen performance problems. After spending too much time testing our infrastructure I think I’ve learned something interesting about ActiveMQ: it can be really quite slow.

Although in general messages travel over ActiveMQ without problems, we’ve noticed that when we get a burst of messages we start to see delays. It’s as though we’re hitting some message rate limit – when we burst above it messages get delayed, only being delivered at the limit. From the timestamps ActiveMQ puts onto messages we could see the broker was accepting messages quickly, but was delayed in sending to the consumer.

I setup a test harness to replicate the problem – which was easy enough. However, the throughput I measured in the test system seemed low: 2,500 messages/second. With a very simple consumer doing basically nothing there was no reason for throughput to be so low. For comparison, using our bespoke messaging layer in the exact same setup, we hit 15,000 messages/second. The second puzzle was that in production the message rate we saw was barely 250 messages/second. Why was the test system 10x faster than production?

I started trying to eliminate possibilities:

  • Concurrent load on ActiveMQ made no difference
  • Changing producer flow control settings made no difference
  • Changing consumer prefetch limit only made the behaviour worse (we write data onto non-durable topics, so the default prefetch limit is high)
  • No component seems to bandwidth or CPU limited

As an experiment I tried moving the consumer onto the same server as the broker and producer: message throughput doubled. Moving the consumer onto a server with a higher ping time: message throughput plummeted.

This led to an insight: the ActiveMQ broker was behaving exactly as though there was a limit to the amount of data it would send to a consumer “at one time”. Specifically I realised, there seemed to be a limit to the amount of unacknowledged data on the wire. If the wire is longer, it takes longer for data to arrive at the consumer and longer for the ack to come back: so the broker sends less data per second.

This behaviour highlighted our first mistake. We use Spring Integration to handle message routing on the consumer side, we upgraded Spring a year ago and one of the changes we picked up in that version bump was a change to how the message driven channel adapter acknowledges JMS messages. Previously our messages were auto-acknowledged, but now the acknowledgement mode was “transacted”. This meant our entire message handling chain had to complete before the ack was sent to the broker.

This explained why the production system (which does useful work with the messages) had a much lower data rate than the test system. It wasn’t just the 1ms ping time the message had to travel over, the consumer wouldn’t send an ack until the consumer had finished processing the message – which could take a few milliseconds more.

But much worse, transacted acknowledgement appears to prevent the consumer prefetching data at all! The throughput we see with transacted acknowledgement is one unacknowledged message on the wire at a time. If we move the consumer further away our throughput plummets. I.e. the broker does not send a new message until it has received an acknowledgement of the previous. Instead of the consumer prefetching hundreds of messages from the broker and dealing with them in turn, the broker is patiently sending one message at a time! No wonder our performance was terrible.

This was easily fixed with a spring integration config change. In the test system our message throughput went from 2,500 messages/second to 10,000 messages/second. A decent improvement.

But I was curious, do we still see the broker behaving as though there is a limit on the amount of unacknowledged data on the wire? So I moved the consumer to successively more distant servers to test. The result? Yes! the broker still limits the amount of unacknowledged data on the wire. Even with messages auto acknowledged, there is a hard cap on the amount of data the broker will send without seeing an acknowledgement.

And the size of the cap? About 64KB. Yes, in 2018, my messaging layer is limited to 64KB of data in transit at a time. This is fine when broker and consumer are super-close. But increase the ping time between consumer and broker to 10ms and our message rate drops to 5,000 messages/second. At 100ms round trip our message rate is 500 messages/second.

This behaviour feels like what the prefetch limit should control: but we were seeing significantly fewer messages (no more than sixty 1kB messages) than the prefetch limit would suggest. So far, I haven’t been able to find any confirmation of the existence of this “consumer window size”. Nor any way of particularly modifying the behaviour. Increasing the TCP socket buffer size on the consumer increased the amount of data in-flight to about 80KB, but no higher.

I’m puzzled, plenty of people use ActiveMQ, and surely someone else would have noticed a data cap like this before? But maybe most people use ActiveMQ with a very low ping time between consumer and broker and simply never notice it?

And yet, people must be using ActiveMQ in globally distributed deployments – how come nobody else sees this?

Advertisements

Async await in Java

Writing asynchronous code is hard. Trying to understand what asynchronous code is supposed to be doing is even harder. Promises are a common way to attempt to describe the flow of delayed-execution: first do a thing, then do another thing, in case of error do something else.

In many languages promises have become the de facto way to orchestrate asynchronous behaviour. Java 8 finally got with the program and introduced CompletableFuture; although seriously, who designed the API? It’s a mess!

The trouble with promises is  that the control flow can become anything but simple. As the control flow becomes more complex it becomes virtually impossible to understand (do this, then that, unless it’s Wednesday, in which case do this other thing, if there’s an error go back three spaces, yada yada yada).

The cool kids have moved on to using async…await. C# has it. JavaScript has it. And now… and now, via some of the big brains at EA, Java has it! Yes, Java has a usable async…await construct, without changing the language!

A simple example: we could compose a couple of asynchronous operations using CompletableFuture as follows:

private static void theOldWay() {
    doAThing()
            .thenCompose(Main::doAnotherThing)
            .thenAccept(Main::reportSuccess)
            .exceptionally(Main::reportFailure);
}

This should be pretty simple to follow, often code using futures is very far from this simple. But with the magic of EA’s async await we can re-write it like this:

private static CompletableFuture<Void> theNewWay() {
    try {
        String intermediate = await(doAThing());
        String result = await(doAnotherThing(intermediate));
        reportSuccess(result);
    } catch (Throwable t) {
        reportFailure(t);
    }
    return completedFuture(null);
}

It looks like synchronous code. But the calls to Async.await are magic. These calls are re-written (at runtime or build time, as you prefer) to make the calls non-blocking!

The code is much easier to write, much easier to read, a million times easier to debug and most importantly it scales naturally. As code becomes more complex you can use normal refactoring tools to keep it under control. With CompletableFutures you end up passing around all these future objects and, somewhere along the line, one day you’re going to miss a code path and boom! One free bug in production.

So even if you’re stuck using Java, you can still be like the cool kids and use async…await!

What craftsmanship means to me

Over a decade ago now I got my first team lead role. It was a reasonably unexpected promotion when the existing team lead left shortly after I joined. This baptism of fire introduced me to line management, but also made me question my career choice. But it was, in hindsight, the beginning of a new journey: of becoming a software craftsman.

With barely 5 years experience I was certainly no senior developer. And yet, here I had been thrust, into a team lead role. With so little experience I made many, many mistakes and was probably a pretty rubbish boss for the three other guys on the team. I tried my best. But the whole process was very draining. But worse, I started to see programming at a more abstract level. In charge of a team, I could see that all we were was a factory for turning requirements into working code. The entire process began to feel like turning a handle: feed the team requirements, some praise and a little coffee and out comes working code.

In the end, a lot of software ends up being very similar: how many CRUD apps does the world really need? Turns out billions of them. And yet, in conception, they’re not massively exciting. Take a piece of data from the user, shovel it back to the database. Take some data out of the database, show it to the user. All very pedestrian. All very repetitive. In this environment it’s easy to become disillusioned with the process of building software. A pointless handle turning exercise.

I moved on from this baptism of fire to my first proper management role. Whereas previously I was still writing code, now I was effectively a full-time manager. I was the team’s meeting and bullshit buffer. It took a lot of buffering. There was a lot of bullshit. I think we even once managed a meeting to discuss why productivity was so poor: maybe the vast number of meetings I was required to attend each day? Or could it have been the 300 emails a day that arrived in my inbox?

If I was disillusioned with the process of writing software before, I now became disillusioned with the entire industry. A large company, little more than a creche for adults, continuing forwards more out of momentum than anything else. Plenty of emails and meetings every day to stop you from having to worry too much about any of that pesky work business.

It was then that I opened my eyes and saw there was a community outside. That programmers across the world were meeting up and discussing what we do. The first thing I saw was the agile community – but even back then it already looked like a vast pyramid scheme. But I was encouraged that there was something larger happening than the dysfunctional companies I kept finding myself working for.

Then Sandro Mancuso and I started talking about software craftsmanship. He introduced me to this movement that seemed to be exactly what I thought was missing in the industry. Not the agile money-go-round, but a movement where the focus is on doing the job right; on life-long learning; on taking pride in your work.

Not long afterwards Sandro and I setup the London Software Craftsmanship Community, which quickly snowballed. It seems we weren’t alone in believing that the job can be done well, that the job should be done well. Soon hundreds of developers joined the community.

The first immediate consequence of my involvement in the software craftsmanship community was discovering a new employer: TIM Group. A company that genuinely has a focus on software built well, with pair programming and TDD. A company where you can take pride in a job done well. The most professional software organisation I’ve worked in. They’re almost certainly still hiring, so if you’re looking, you should definitely talk to them.

Finally I’d found the antidote to my disillusionment with how software is often built: the reason I was frustrated is that it was being built badly. That companies often encourage software to be built slapdash and without care, either implicitly or sometimes even explicitly. If building software feels like just turning a handle it’s because you’re not learning anything. If you’re not learning, it’s because you’re not trying to get better at the job. Don’t tell me you’re already perfect at writing software, I don’t believe it.

Through software craftsmanship I rediscovered my love of programming. My love of a job done well. The fine focus on details that has always interested me. But not just the fine details of the code itself: the fine details of how we build it. The mechanics of TDD done well, of how it should feel. I discovered that as I became more senior not only did I find I had so much more to learn, but now I could also teach others. Not only can I take pride in a job done well, but pride in helping others improve, pride in their job done well.

Friction in Software

Friction can be a very powerful force when building software. The things that are made easier or harder can dramatically influence how we work. I’d like to discuss three areas where I’ve seen friction at work: dependency injection, code reviews and technology selection.

DI Frameworks

A few years ago a colleague and I discussed this and came to the conclusion that the reason most DI frameworks suck (I’m looking in particular at you, Spring) is that they make adding new dependencies so damned easy! There’s absolutely no friction. Maybe a little XML (shudder) or just a tiny little attribute. It’s so easy!

So when we started a new, greenfield project, we decided to put our theory to the test and introduced just a little bit of friction to dependency injection. I’ve written before about the basic scheme we adopted and the AOP endpoint it reached. But the end result was, I believe, very successful. After a couple of years of development we still had of the order of only 10-20 dependencies. The friction we’d introduced was light (add a couple of lines to a single class), but it was sufficient to act as a constant reminder not to just add a new dependency because it was easy.

Code Reviews

I was reminded of this recently when discussing code reviews. I have mixed feelings about code reviews: I’ve seen them work well, and it is better to have code reviews than not to have them; but it’s better still to pair program. But not all teams, not all developers, like pair programming – so code reviews exist. The trouble with code reviews is they can provide a form of friction.

If you & I are pairing on a piece of work, we will discuss the various trade-offs as we go: do we spend time on this, do we refactor that, etc etc. The constant judgements about what warrants attention and what can be left for another day are verbalised and agreed. In general I find the code written while pairing is high in quality but also remains tightly focused on task. The long rambling refactors I’ve been guilty of in the past disappear and the lazy “quick hacks” we all try and explain away to ourselves, aren’t so easy to gloss over when pairing.

But code reviews exist outside of this dynamic. In the cold light of the following day, someone uninvolved reviews your work and passes judgement on whether they think it’s up to scratch. It’s easy to see why this becomes combative: rather than being collaborative it can be seen as a judgement being passed, on not only the code but the author, too.

When reviewing code it is easy to set a very high bar, higher than you might set for yourself and higher than you might have agreed when pairing. Now, does this mean the comments aren’t valid? Absolutely not! You’re right, there is a test case missing here, although my change is unrelated, I should have added the missing test case. And you’re right this code is a mess; it was a mess before I was here and made a simple edit; but you’re right, I should have tidied it up. Everyone should practice code gardening.

These are all perfectly valid comments. But they create a form of friction. When I worked on a team that relied on these code reviews you knew you were going to get comments: so you kept the commit small, so as to minimize the diff. A small diff minimizes the amount of extra tests you could be asked to write. A small diff keeps most of the existing mess out of the review, so you won’t be asked to start refactoring.

Now, this seems dysfunctional: we’re deliberately trying to optimize a smooth passage through the review process, instead of optimizing for code quality. Worse than this though was what never happened: refactoring commits. Looking back I realise that the only code reviews I saw (as both reviewer and reviewee) were for feature changes. There were never any code reviews submitted for purely technical debt reduction. Sure, there’d be some individual commits in amongst the feature changes. But never any dedicated, multi-commit sessions, whose sole aim was to improve the code base. Which was a shame, because like any legacy code base, there was scope for improvement.

Comparing this to teams that don’t do code reviews, where I’ve tended to see more effort on reducing technical debt. Without fearing an endless cycle of review comments, developers are free to embark on refactoring efforts (that may or may not even work out!) – but at least they can try. Instead, code reviews provide a form of friction that might actually hurt code quality in the long run.

Technology Selection

I was talking to another colleague recently who is convinced that Hibernate is still the best way to get data in and out of a relational database. I can’t really work out how to persuade people they’re wrong – surely using Hibernate is enough to persuade you? Especially in a large, legacy code base – the pain that Hibernate causes is obvious. Yet plenty of people still believe in Hibernate. There are even people that still believe in Spring. Whether or not they still believe in the tooth fairy is unclear.

But I think technology selection is another area where friction is important. When contemplating moving away from something well-known and well used in industry like Spring or Hibernate there is a lot of friction. There are new technologies to learn, new approaches to understand and new risks to manage. This all adds friction, so sometimes it’s easiest just to stick with what we know. Sometimes it really is the right choice – the technology you have expertise in is the one you’ll be most productive in immediately. But there are longer term questions too, which are much harder to answer: will the team eventually be more productive using technology X than technology Y?

Friction in software is a powerful process: we’re very lazy creatures, constantly trying to optimise. Anything that slows us down or gets in our way quickly gets side-stepped or worked around. We can use this knowledge as a tool to guide developer behaviour; but sometimes we need to be aware of how friction can change behaviours for the worse as well.

Copy & paste driven development

Software development is rife with copy & paste: all of us resort to copy and paste coding sometimes. We know we probably shouldn’t, but we do
it anyway. It’s like the industry’s dirty little secret: we mainly jucopy-paste-naileditst copy and paste code from the internet or from somewhere else in the code base then bash it till it works.

But maybe, just maybe, the fact that we all rely on this from time to time should be telling us something?

The good

Sometimes copy & paste coding can be a good thing. A while ago I was pairing with someone where we did what I would call “search and replace coding”. I love to code golf. In tools like Eclipse, Intellij or Resharper there is always an optimal way to make each change, letting the tools do as much of the typing as possible. So it was with fascination recently that a colleague showed me an interesting code golf using search & replace.

The change was around extending an existing class with a load of new fields. We had a basic class, with a couple of sample fields, and a test that verified something simple about the class – say that it could be serialized to JSON successfully. We needed to add a boatload of new fields to the class (don’t ask). This involved five separate tasks which, at a macro level, had a lot in common:

  • Adding each field
  • Initialising each field in the constructor
  • Modifying the test to setup sample values for each field
  • Modifying the test to pass the sample value for each field to the constructor
  • Writing an assert that each field had successfully serialized and deserialized

My colleague demonstrated that we could write the list of fields once. Then, copy & paste, search & replace – we have a list of parameters for the constructor. By carefully crafting the search term and a suitable replacement term you can do some limited meta-programming. Taking the list of fields and transforming it into the actual lines of code you need in each instance. The list of fields replaced one way gives you a list of parameters to the constructor; with a different replacement you get the field declarations in the test; with another replacement you get assert statements.

I found this a very interesting way to write software. It definitely optimized the number of key presses required to type the code in. The long, boring list of field names only had to be typed in once; after that merely carefully crafting a search & replace regex to do the lifting for us. But it demonstrates an underlying truth: we had five separate changes to make, which accepted a parameter. If we could have actually meta-programmed this, we would have passed in the list of fields to some meta-programming code which would output the desired lines of code.

While it’s very clever that we could use search & replace meta-programming for this, it feels like the tools are lacking somehow.

The bad

Everyone copies code from stackoverflow from time to time. Hopefully you do it sensibly, using it as a starting point for your own production code. Working through what you’ve just pasted in to understand it, experimenting with it, modifying it, making it fit for your purpose. Rather than just blindly copy & pasting random code from the internet. I mean, who’d just randomly run code from the internet?

Stackoverflow is great. It’s an amazing resource for programmers. While learning WPF I got the sense that as a technology it couldn’t have taken off without something like stackoverflow. The technology is so opaque, so hard to learn. It took me many, many months of copy & pasting xaml from stackoverflow until I started to really understand how it worked. This is a technology that is not trivial to understand. Without being able to just copy & paste code from the internet, the technology would take a lot longer to learn.

But often, we’re lazy, we try it. It works. Woohoo, next problem. I’ve written before about voodoo programming, but the problem is that it’s easy to think you’ve understood what code is doing. If you didn’t actually have to invent those lines, to reason through to them – maybe you don’t understand. Maybe you only think you know what the code’s doing? Maybe there’s some horrific bug you’re not aware of yet?

The ugly

Almost every time I’ve seen TDD done at any scale, when it comes to writing a new test, the first question to ask is: “which existing test is this most like?” Yup, where can I go copy & paste. I’m so lazy, I don’t want to have to invent a whole test setup on my own. I’ll just borrow somebody else’s homework. We’ve all done it. It seems to be an unwritten rule of TDD. By the time you’re on to the third or fourth test in a given file, I guarantee the temptation to just copy & paste to make the fifth test is incredibly strong.

The trouble with this is all sorts of weird test artefacts get copied forwards. You start off with a simple test, with some simple setup. Say an empty bank account. Then to test non-zero balance you need an account with a transaction. Then to test balance summing you need an account with two transactions. Each test so far is building on the last, so you’ve just copy & pasted the previous one as a starting point. The fourth test is inserting a new transaction, so you just copy the third test (with two existing transactions, unnecessary for this test). Test five is that a withdrawal can’t go below zero, so you copy & paste the previous test and set the amount to be a large negative value. Test six is an overdraft test, so you copy & paste the previous test but change the account setup to include an overdraft so the balance can go negative. By the time you copy & paste test seven, your starting point is an account with an overdraft, with two transactions where a third transaction is added. Test seven is about adding a standing order. None of this noise is necessary.

This might seem a ridiculous example, but I see this time and again in real code. Sometimes it’s not even me that’s done it. This noise accumulates as you read down a test file. The tests at the bottom have all sorts of weird artefacts that were only relevant to one test half way up the file. It means fixing tests and changing behaviours becomes a real problem. If I change, say, how overdrafts are defined in my example above – I might have half a dozen tests to change, only one of which even mentions overdrafts. But they’ve inherited that setup because the tests were just copy & pasted around. Not only does this make tests hard to read and understand it makes them hard to maintain.

We all do it. It seems a pretty accepted part of how TDD happens in the wild. And yet, it clearly isn’t right. With discipline, we can keep our tests clean. Yes, when we’re all being conscientious developers we start writing our tests from scratch each time. But most of the time we’re busy or lazy or whatever.

Conclusion

What do these three things have in common, besides copy & paste? In each example we’re using copy & paste to save time. To find the most efficient path through the work we’re doing. Nobody is doing it out of malice or stupidity. Laziness? Almost certainly – but the good kind. The kind of laziness that encourages elegant solutions.

But copy & paste isn’t an elegant solution. It’s a crappy solution to a more fundamental problem: our tools are deficient. Really what we’re working around is the fact that our tools don’t let us express what we really want.

What if we could write our tests in a higher level language? “A test with a bank account with two transactions”. Sure, there are internal & external DSLs you can use to do this. But typically the cost in setting up the DSL isn’t worth the hassle for unit tests. It would completely ruin the flow of TDD. Does that just mean we haven’t found a good way of doing it yet? Is there a way we could more fluidly express the intent of the test, filling in the gaps as we go?

Instead of copy & pasting code from the internet, could our tools get smarter? Could we take some of the amazing machine learning stuff that’s going on and apply it to software development? I tried playing with an IntelliJ plugin recently that promises just that. Unfortunately it’s pretty buggy at the minute and doesn’t really work. But the idea is incredibly attractive. I like the idea of being able to express intent instead of mindlessly typing in the nuts & bolts.

Finally, instead of doing search & replace coding, wouldn’t it be great if we could actually meta-program? If we could actually write code that would write code for us? Not just code generators, but something that can generate small sections of code. I had a very limited go at this some time ago with rescripter – but it turns out its very hard to write a decent meta-programming tool, that anyone except the author can understand. But I think the idea still has merit: too often I find cases where I can describe the intent of my change very succinctly, but implementing it will involve far more typing than I’d like.