Friction in Software

Friction can be a very powerful force when building software. The things that are made easier or harder can dramatically influence how we work. I’d like to discuss three areas where I’ve seen friction at work: dependency injection, code reviews and technology selection.

DI Frameworks

A few years ago a colleague and I discussed this and came to the conclusion that the reason most DI frameworks suck (I’m looking in particular at you, Spring) is that they make adding new dependencies so damned easy! There’s absolutely no friction. Maybe a little XML (shudder) or just a tiny little attribute. It’s so easy!

So when we started a new, greenfield project, we decided to put our theory to the test and introduced just a little bit of friction to dependency injection. I’ve written before about the basic scheme we adopted and the AOP endpoint it reached. But the end result was, I believe, very successful. After a couple of years of development we still had of the order of only 10-20 dependencies. The friction we’d introduced was light (add a couple of lines to a single class), but it was sufficient to act as a constant reminder not to just add a new dependency because it was easy.

Code Reviews

I was reminded of this recently when discussing code reviews. I have mixed feelings about code reviews: I’ve seen them work well, and it is better to have code reviews than not to have them; but it’s better still to pair program. But not all teams, not all developers, like pair programming – so code reviews exist. The trouble with code reviews is they can provide a form of friction.

If you & I are pairing on a piece of work, we will discuss the various trade-offs as we go: do we spend time on this, do we refactor that, etc etc. The constant judgements about what warrants attention and what can be left for another day are verbalised and agreed. In general I find the code written while pairing is high in quality but also remains tightly focused on task. The long rambling refactors I’ve been guilty of in the past disappear and the lazy “quick hacks” we all try and explain away to ourselves, aren’t so easy to gloss over when pairing.

But code reviews exist outside of this dynamic. In the cold light of the following day, someone uninvolved reviews your work and passes judgement on whether they think it’s up to scratch. It’s easy to see why this becomes combative: rather than being collaborative it can be seen as a judgement being passed, on not only the code but the author, too.

When reviewing code it is easy to set a very high bar, higher than you might set for yourself and higher than you might have agreed when pairing. Now, does this mean the comments aren’t valid? Absolutely not! You’re right, there is a test case missing here, although my change is unrelated, I should have added the missing test case. And you’re right this code is a mess; it was a mess before I was here and made a simple edit; but you’re right, I should have tidied it up. Everyone should practice code gardening.

These are all perfectly valid comments. But they create a form of friction. When I worked on a team that relied on these code reviews you knew you were going to get comments: so you kept the commit small, so as to minimize the diff. A small diff minimizes the amount of extra tests you could be asked to write. A small diff keeps most of the existing mess out of the review, so you won’t be asked to start refactoring.

Now, this seems dysfunctional: we’re deliberately trying to optimize a smooth passage through the review process, instead of optimizing for code quality. Worse than this though was what never happened: refactoring commits. Looking back I realise that the only code reviews I saw (as both reviewer and reviewee) were for feature changes. There were never any code reviews submitted for purely technical debt reduction. Sure, there’d be some individual commits in amongst the feature changes. But never any dedicated, multi-commit sessions, whose sole aim was to improve the code base. Which was a shame, because like any legacy code base, there was scope for improvement.

Comparing this to teams that don’t do code reviews, where I’ve tended to see more effort on reducing technical debt. Without fearing an endless cycle of review comments, developers are free to embark on refactoring efforts (that may or may not even work out!) – but at least they can try. Instead, code reviews provide a form of friction that might actually hurt code quality in the long run.

Technology Selection

I was talking to another colleague recently who is convinced that Hibernate is still the best way to get data in and out of a relational database. I can’t really work out how to persuade people they’re wrong – surely using Hibernate is enough to persuade you? Especially in a large, legacy code base – the pain that Hibernate causes is obvious. Yet plenty of people still believe in Hibernate. There are even people that still believe in Spring. Whether or not they still believe in the tooth fairy is unclear.

But I think technology selection is another area where friction is important. When contemplating moving away from something well-known and well used in industry like Spring or Hibernate there is a lot of friction. There are new technologies to learn, new approaches to understand and new risks to manage. This all adds friction, so sometimes it’s easiest just to stick with what we know. Sometimes it really is the right choice – the technology you have expertise in is the one you’ll be most productive in immediately. But there are longer term questions too, which are much harder to answer: will the team eventually be more productive using technology X than technology Y?

Friction in software is a powerful process: we’re very lazy creatures, constantly trying to optimise. Anything that slows us down or gets in our way quickly gets side-stepped or worked around. We can use this knowledge as a tool to guide developer behaviour; but sometimes we need to be aware of how friction can change behaviours for the worse as well.

How many builds?

I’m always amazed at the seemingly high pain threshold .net developers have when it comes to tooling. I’ve written before about the poor state of tooling in .net, but just recently I hit another example of poor tooling that infuriates me: I have too many builds, and they don’t agree whether my code compiles.

One of the first things that struck me when starting to develop on .net was that compiling code was still a thing. An actual step that had to be thought about. Incremental compilers in Eclipse and the like have been around for ages – to the point where, generally, Java developers don’t actually have to instruct their IDE to compile code for them. But in Visual Studio? Oh it’s definitely necessary. And time consuming. Oh my god is it slow. Ok, maybe not as slow as Scala. But still unbelievably slow when you’re used to working at the speed of thought.

Another consequence of the closed, Microsoft way of doing things is that tools can’t share the compiler’s work. So ReSharper have implemented their own compiler, effectively. It incrementally parses source code and finds compiler errors. Sometimes it even agrees with the Visual Studio build. But all too often it doesn’t. From the spurious not-actually-an-error that I have to continually instruct ReSharper to ignore; to the warnings-as-errors build failures that ReSharper doesn’t warn me about; to the random why-does-ReSharper-not-know-about-that-NuGet-package-errors.

This can be infuriating when refactoring. E.g. if an automated refactor leaves a variable unused, I will now have a compiler warning; since all my projects run with warnings-as-errors switched on, this will fail the build. But ReSharper doesn’t know that. So I apply the refactoring, code & tests are green: commit. Push. Boom! CI is red. But it was an automated refactor for chrissakes, how’ve I broken the build?!

I also use NCrunch, an automated test runner for Visual Studio (like Infinitest in the Java world). NCrunch is awesome, by the way; better even than the continuous test runner in ReSharper 10. If you’ve never used a continuous test runner and think you’re doing TDD, sort your life out and setup Infinitest or NCrunch. It doesn’t just automate pressing the shortcut key to run all your tests. Well, actually that is exactly what it does – but the impact it has on your workflow is so much more than that. When you can type a few characters, look at the test output and see what happened you get instant feedback. This difference in degree changes the way you write code and makes it so much easier to do TDD.

Anyway I digress – NCrunch, because Microsoft, can’t use the result of the compile that Visual Studio does. So it does its own. It launches MSBuild in the background, continually re-compiling your code. This is not exactly kind on your CPU. It also introduces inconsistencies. Because NCrunch is running a slightly different MSBuild on each project to the build VisualStudio does you get subtly different results sometimes; which is different again from ReSharper with its own compiler that isn’t even using MSBuild. I now have three builds. Three compilers. It is honestly a miracle when they all agree that my code compiles.

An all-too-typical dev cycle becomes:

  • Write test
  • ReSharper is happy
  • NCrunch build is failing, force reload NCrunch project
  • NCrunch builds, test fails
  • Make test pass
  • Try to run app
  • VisualStudio build fails
  • Fix NuGet problems
  • NCrunch build is now failing
  • Force NCrunch to reload at least one project again
  • Force VisualStudio to rebuild the project
  • Then the solution
  • Run app to sanity check change
  • ReSharper now shows error
  • Re-ignore perennial ReSharper non-error
  • All three compilers agree, quick: commit!

Normally then the build fails in CI because I still screwed up the NuGet packages.

Then recently, as if this wasn’t already one of the outer circles of hell. The CI build was failing for a bizarre reason. We have a command line script which applies the same build steps that CI runs, so I thought I’d run that to replicate the problem. Unfortunately the command line build was failing on my machine for a spectacularly spurious reason that was different again than the failure in CI. Great. I now have five builds which don’t all agree on whether my code compiles.

Do you really hate computers? Do you wish you had more reasons to defenestrate every last one of them? Have you considered a career in software development?

Old Age Code

Is your code ready for retirement? Is it suffering from the diseases of old age? Do you have code you can’t even imagine retiring? It’s just too critical? Too pervasive? Too legacy?

Jon & The Widgets

Conveyor Belt - thanks to
Conveyor Belt – thanks to

Jon’s first job out of school was in the local widget factory, WidgetCo. Jon was young, enthusiastic and quickly took to the job of making widgets. The company was pleased with Jon and he grew in experience, learning more about making widgets, taking on ever more responsibility; until eventually Jon was responsible for all widget production.

After a couple of years one of WidgetCo’s biggest customers started asking about a new type of square widget. They had only made circular widgets before, but with Jon’s expertise they thought they could take on this new market. The design team got started, working closely with Jon to design a new type of square widget for Jon to produce. It was a great success, WidgetCo were the first to market with a square widget and sales went through the roof.

Unfortunately the new more complex widget production pipeline was putting a lot of pressure on the packing team. With different shape widgets and different options all needing to be sorted and packed properly mistakes were happening and orders being returned. Management needed a solution and turned to Jon. The team realised that if Jon knew a bit more about how orders were going to be packed he could organise the widget production line better to ensure the right number of each shape widget with the right options were being made at the right time. This made the job much easier for the packers and customer satisfaction jumped.

Sneeze - thanks to
Sneeze – thanks to

A few years down the line and Jon was now a key part of the company. He was involved in all stages of widget production from the design of new widgets and the tools to manufacture them through to the production and packaging. But one day Jon got sick. He came down with a virus and was out for a couple of days: the company stopped dead. Suddenly management realised how critical Jon was to their operations – without Jon they were literally stuck. Before Jon was even back up to speed management were already making plans for the future.

Shortly after, the sales team had a lead that needed a new hexagonal widget. Management knew this was a golden opportunity to try and remove some of their reliance on Jon. While Jon was involved in the initial design, the team commissioned a new production line and hired some new, inexperienced staff to run it. Unfortunately hexagonal widgets were vastly more complex than the square ones and, without Jon’s experience the new widget production line struggled. Quality was too variable, mistakes were being made and production was much too slow. The team were confident they could get better over time but management were unhappy. Meanwhile, Jon was still churning out his regular widgets, same as he always had.

But the packing team were in trouble again – with two, uncoordinated production lines at times they were inundated with widgets and at other times they were idle. Reluctantly, management agreed that the only solution was for Jon to take responsibility for coordinating both production lines. Their experiment to remove their reliance on Jon had failed.

The Deckhand - thanks to
The Deckhand – thanks to

A few years later still and the new production line had settled down; it never quite reached the fluidity of the original production line but sales were ok. But there was a new widget company offering a new type of octagonal widget. WidgetCo desperately needed to catch up. The design team worked with Jon but, with his workload coordinating two production lines, he was always busy – so the designers were always waiting on Jon for feedback. The truth was: Jon was getting old. His eyes weren’t what they used to be and the arthritis in his fingers made working the prototypes for the incredibly complex new widgets difficult. Delays piled up and management got ever more unhappy.

So what should management do?

Human vs Machine

When we cast a human in the central role in this story it sounds ridiculous. We can’t imagine a company being so reliant on one frail human being that it’s brought to its knees by an illness. But read the story as though Jon is a software system and suddenly it seems totally reasonable. Or if not reasonable, totally familiar. Throughout the software world we see vast edifices of legacy software, kept alive way past their best because nobody has the appetite to replace them. They become too ingrained, too critical: too big to fail.

14256058429_f7802658c8_zFor all it’s artificial construct, software is not so different from a living organism. A large code base will be of a level of complexity comparable with an organism – too complex for any single human being to comprehend in complete detail. There will be outcomes and responses that can’t be explained completely, that require detailed research to understand the pathway that leads from stimulus to response.

But yet we treat software like it is immortal. As though once written software will carry on working forever. But the reality is that within a few years software becomes less nimble, harder to change. With the growing weight of various changes of direction and focus software becomes slower and more bloated. Each generation piling on the pounds. Somehow no development team has mastered turning back time and turning a creaking, old age project into the glorious flush of youth where everything is possible and nothing takes any time at all.

It’s time to accept that software needs to be allowed to retire. Look around the code you maintain: what daren’t you retire? That’s where you should start. You should start planning to retire it soon, because if it’s bad now it is only getting worse. As the adage goes: the best time to start fixing this was five years ago; the second best time is now.

Sad dog - thanks to
Sad dog – thanks to

After five years all software is a bit creaky; not so quick to change as it once was. After ten years it’s well into legacy; standard approaches have moved on, tools improved, decade old software just feels dated to work with. After twenty years it really should be allowed to retire already.

Software ages badly, so you need to plan for it from the beginning. From day one start thinking about how you’re going to replace this system. A monolith will always be impossible to replace, so constantly think about breaking out separate components that could be retired independently. As soon as code has got bigger than you can throw away, it’s too late. Like a black hole a monolith sucks in functionality, as soon as you’re into run away growth your monolith will consume everything in it’s path. So keep things small and separate.

What about the legacy you have today? Just start. Somewhere. Anywhere. The code isn’t getting any younger.

Git stash driven development

I’ve found myself using a pattern quite often recently, which I’ve been calling “git stash driven development” – that is, relying heavily on the magic of git stash as part of my development workflow.

Normally I follow what I think of as a fairly typical TDD workflow:

  • Write next test, watch it fail
  • Write code to make it pass
  • Commit
  • Refactor
  • Commit
  • Push

This cycle can repeat very frequently – as often as every couple of minutes. Sometimes this cycle gets slowed down when the next test to write isn’t obvious or the refactoring needs more thought. But generally this is the process I try and follow.

Quite often having written the next test which takes me forwards on my feature I hit a problem: I can’t actually make the test pass (easily). First I need to refactor to make the problem easy. In that situation I can mark the test as ignored, commit and come back to it later. I refactor as required, commit, push; then finally unignore my test and get back to where I was before. This is a fairly neat process.

However there are a couple of times when this process doesn’t work: what if I’m part way through writing my test and I realise I can’t finish without refactoring the test infrastructure? I can’t ignore my test, it probably isn’t even compiling. I certainly don’t want to commit it in its current state. I could just bin my test and re-write it, if I’m following the 15 minute rule I’m not going to lose much work. But, with the magic of git stash, I can stash my changes and come back once I’ve refactored the test code.

The more annoying time this happens is when I’m part way through a refactor step. This happens more commonly when I’m really going through a design-change – this isn’t really refactoring as it often happens outside of the normal TDD loop. I’m trying to evolve the design to somewhere different; sometimes this is driven by tests, sometimes its a non-feature changing refactor. But often there are non-trivial changes happening across numerous source files. At this point it is very easy to get part way through a refactor and realise that something else needed to have happened first. I could bin my change, I only stand to lose 15 minutes work – but why throw it away when I have git stash?

So I git stash my changes, go and make the change I needed to have happened first. Then, all too commonly, I get part way through this second change and realise something else needs to happen first. Well, git stash again! This stack of git stashes can get quite deep, if you’re not careful. But once I’ve bottomed the stack out, once I’ve managed to commit a refactor that frees up the step above I can git stash pop, complete the next refactor, commit, git stash pop; and so on up the stack until I’m done.

Now, arguably, I’m discovering the refactor in reverse order, but this seems to me often how I find it. I could have spent more time analysing the change in detail, of course. Spent time planning out my change on paper before embarking on it in the correct order. However, this is always time consuming and there’s still the risk that I miss something and come at a change “backwards”. I find that using git stash in this way lets me discover the refactor that I need to make one step at a time. Each commit is kept small, I try and stick to the 15 minute rule so that no single commit loses more than 15 minutes. Ultimately the design change is completed in a sequence of small commits, each of which builds logically on the one before. They’ve been discovered by exploration, the commits were just discovered in reverse order.

The danger is always that I find a refactor step I can’t complete the way I’d imagined – now I can’t unwind the stack and potentially all the previous git stashes aren’t committable. Whenever this happens I normally find going one or two levels up the stack will present a different approach, from where I can continue as before.

VW’s rogue software developers

So Michael Horn has thrown a couple of software developers under the proverbial bus by blaming them for the defeat device at the centre of the emissions scandal. Now, it is clearly ridiculous to suggest that a couple of rogue individuals single-handedly saved VW’s clean diesel engine program and nobody else had any idea what was going on. However, I think it is fair to say that a couple of software developers did know what was going on and did nothing.

Unless VW is unlike every other organisation I’ve ever know it is inconceivable that nobody outside the dev team would have known what was going on. It’s a pretty rare organisation that leaves software developers to just bash away at the keyboard and dream up some cool stuff. Almost everywhere programmers are managed, project managed and product managed to make sure they keep churning out the good stuff. Developers aren’t given free reign to just make up emissions test defeating software for fun. What was this, VW’s equivalent of 10% time?

Let’s cut the developers some slack then – they were probably just doing what they had been told to. I’ve worked in large organisations that sailed pretty close to regulatory lines and I can well imagine that this was just one change in amongst hundreds that were in what might generously be called a “grey area”. However, did they know that the software was going to be used to cheat emissions tests? Did they know this would leave their product in breach of the law in some countries? Or were they just ignorant fools?

Maybe they didn’t know what they were doing. Maybe the exact details of the goals of the software were kept secret from them – this is entirely possible. If we assume some people in management were aware of what was being done, and the legal implications of what they were doing: every effort would be made not to commit any details to paper and to limit the number of people who have the full picture. Where possible, the developers would be given a very specific set of requirements which would lead them to implement the right thing, without them necessarily understanding the eventual impact. With an unquestioning workforce an amazing amount can be achieved while only a handful of people understand the full story.

However, this is not to excuse the developers: we are not mindless automatons, we are intelligent creatures. We are capable of questioning why. In fact, as a professional developer, I think it is my duty to ask why. If I don’t understand how a requirement fits into the environment, how can I possibly be sure I’m building it right? I think it is up to each of us to ensure we know how our software will be used. This is not to make us the world’s social conscience – but to make us better developers.

Now if they did know what the software was to be used for: they are complicit in this law-breaking. They understood what they were doing, understood it would be against the law. And yet they did it anyway. It is not sufficient to argue that they were just “following orders”. Many people throughout history were “just following orders” and through their hands great evils were perpetrated. Now a breach of the clean air act is no holocaust, but the individuals involved must bear some of the responsibility for what they have done.

But we take no responsibility in this industry. We happily churn out rubbish code that is full of bugs “because management told us to”. We will happily churn out law-breaking software “because management told us to”. When will we start taking some responsibility for our actions? When will we show some professional standards? This doesn’t mean that we should be held accountable for every single defect in every line of code. But if I’ve not followed best practices and my code has an issue which costs my customer money, am I liable? If I’d done a better job would the code have had the same issue? Maybe, maybe not. Who takes responsibility for standards in software? Is it the customer’s responsibility? Or is it about time we took responsibility? About time we showed some pride in our work. About time we showed some professionalism.