What is software?

What actually is software? It’s obviously not a physical thing you can point at. If I imagine a specific piece of software, where does the software stop and not-software begin?

I recently read Sapiens, a fantastic book on the history of humankind. One of the things he talks about is the “legend of Peugeot”. When we think of Peugeot the company, what do we mean? It’s not the cars they produce – the company would exist and would keep on making cars even if all the Peugeot cars in existence were scrapped overnight. It’s not the factories and offices and assembly lines, which could be rebuilt if they all suddenly burned down. It’s not the employees either – if all the employees resigned en masse, the company would hire more staff and would carry on making cars. Peugeot is a fiction – a legal fiction we all choose to believe in.

Back to software? What is software? Maybe it’s the compiled binary artefact? An executable or DLL or JAR file. But is that really what software is? Software is a living, growing, changing thing – a single binary is merely a snapshot at a given point in time.

Perhaps then software includes the source code. Without the source code, what we have is dead software – a single binary that can never (easily) be changed. Sure, we could in theory reverse engineer something resembling source code from the binary, but for any reasonably sized piece of software, would making anything beyond a trivial change be feasible, without the original source?

Even with the source code, could I just pick up, say, the source to Chrome or Excel and start hacking away? It seems unlikely – I’d need to spend time familiarising myself with the code and reading documentation. So maybe documentation is part of what makes software.

Even better than reading documentation I’d talk to other developers who are already familiar with the code – they would be able to explain it to me and answer my questions. Perhaps even more importantly, developers would be able to explain why certain things are the way they are – this tells the story of the software, the history of how it got to be the way it is. The decisions taken along the way, the mistakes made and the paths not taken.

So the knowledge developers have of how the software works and how it got there is part of what makes up software. What other knowledge makes up software? How about the process for releasing a new version? Without that knowledge modified source code is useless. To be real, live software new versions need to get into the hands of users.

When it comes to interacting with the real world, how much of that context is part of what defines the software? Look at Uber, for example – without physical cars, what use is the software? In some sense, the physical cars and their drivers are part of a software stack.

But is software even a single thing, at a single scale? Is the web front end a single piece of software? Without its backend service it is rendered useless. Does that make it a single piece of software or two?

How about as software evolves? Does it become something different, unique from the software that went before? The original version is part of the history of the software, but it isn’t distinct from it. Only if the old version was forked can we end up with a new piece of software – at this point their histories diverge. Over time they will adapt to subtly different contexts, have different dependencies, different decisions and goals: they will become two different pieces of software.

What about if software is re-written? Imagine the team responsible for the backend service decide the only solution to their technical debt problem is a re-write. So they begin re-writing it from scratch. Eventually, they switch over to the new version – the old one is archived, only kept in version control for the curious. Is this a new piece of software? Or logically just a new version? The software stack still performs the same overall purpose, there’s still only one backend service. The original, debt-laden version has become part of the history of the software: we don’t have two systems – we have one. This suggests that the actual source code is not what makes software.

If software isn’t the source code, what is it? It can’t be the team that owns it, although team members may come and go the software lives on. It isn’t the documentation either – the documentation could be re-written and the software would live on. Software is defined by its context but it is more than its context and processes.

Software is all these things and none of them. Just like Peugeot, software is a fiction we all believe in. We all pretend we know what we mean when we talk about “software”, but what actually is it?

If software is anything it is a story. It is the history of how the code got to where it is today: the decisions that were taken, the context it sits within, the components it interacts with. Documentation is an attempt to preserve this history; processes an attempt to codify lessons learned.

If software is a story, the team are the medium through which the story is kept alive. If you’ve ever seen what happens when a new team takes over legacy software, you’ll know what happens when the story dies: zombie software, not quite dead but not quite alive; still evolving and changing, but full of risk that any change could bring about disaster.

What is software? Software is a story: a story of how this got to be the way it is, whatever “this” might be.

Friction in Software

Friction can be a very powerful force when building software. The things that are made easier or harder can dramatically influence how we work. I’d like to discuss three areas where I’ve seen friction at work: dependency injection, code reviews and technology selection.

DI Frameworks

A few years ago a colleague and I discussed this and came to the conclusion that the reason most DI frameworks suck (I’m looking in particular at you, Spring) is that they make adding new dependencies so damned easy! There’s absolutely no friction. Maybe a little XML (shudder) or just a tiny little attribute. It’s so easy!

So when we started a new, greenfield project, we decided to put our theory to the test and introduced just a little bit of friction to dependency injection. I’ve written before about the basic scheme we adopted and the AOP endpoint it reached. But the end result was, I believe, very successful. After a couple of years of development we still had of the order of only 10-20 dependencies. The friction we’d introduced was light (add a couple of lines to a single class), but it was sufficient to act as a constant reminder not to just add a new dependency because it was easy.

Code Reviews

I was reminded of this recently when discussing code reviews. I have mixed feelings about code reviews: I’ve seen them work well, and it is better to have code reviews than not to have them; but it’s better still to pair program. But not all teams, not all developers, like pair programming – so code reviews exist. The trouble with code reviews is they can provide a form of friction.

If you & I are pairing on a piece of work, we will discuss the various trade-offs as we go: do we spend time on this, do we refactor that, etc etc. The constant judgements about what warrants attention and what can be left for another day are verbalised and agreed. In general I find the code written while pairing is high in quality but also remains tightly focused on task. The long rambling refactors I’ve been guilty of in the past disappear and the lazy “quick hacks” we all try and explain away to ourselves, aren’t so easy to gloss over when pairing.

But code reviews exist outside of this dynamic. In the cold light of the following day, someone uninvolved reviews your work and passes judgement on whether they think it’s up to scratch. It’s easy to see why this becomes combative: rather than being collaborative it can be seen as a judgement being passed, on not only the code but the author, too.

When reviewing code it is easy to set a very high bar, higher than you might set for yourself and higher than you might have agreed when pairing. Now, does this mean the comments aren’t valid? Absolutely not! You’re right, there is a test case missing here, although my change is unrelated, I should have added the missing test case. And you’re right this code is a mess; it was a mess before I was here and made a simple edit; but you’re right, I should have tidied it up. Everyone should practice code gardening.

These are all perfectly valid comments. But they create a form of friction. When I worked on a team that relied on these code reviews you knew you were going to get comments: so you kept the commit small, so as to minimize the diff. A small diff minimizes the amount of extra tests you could be asked to write. A small diff keeps most of the existing mess out of the review, so you won’t be asked to start refactoring.

Now, this seems dysfunctional: we’re deliberately trying to optimize a smooth passage through the review process, instead of optimizing for code quality. Worse than this though was what never happened: refactoring commits. Looking back I realise that the only code reviews I saw (as both reviewer and reviewee) were for feature changes. There were never any code reviews submitted for purely technical debt reduction. Sure, there’d be some individual commits in amongst the feature changes. But never any dedicated, multi-commit sessions, whose sole aim was to improve the code base. Which was a shame, because like any legacy code base, there was scope for improvement.

Comparing this to teams that don’t do code reviews, where I’ve tended to see more effort on reducing technical debt. Without fearing an endless cycle of review comments, developers are free to embark on refactoring efforts (that may or may not even work out!) – but at least they can try. Instead, code reviews provide a form of friction that might actually hurt code quality in the long run.

Technology Selection

I was talking to another colleague recently who is convinced that Hibernate is still the best way to get data in and out of a relational database. I can’t really work out how to persuade people they’re wrong – surely using Hibernate is enough to persuade you? Especially in a large, legacy code base – the pain that Hibernate causes is obvious. Yet plenty of people still believe in Hibernate. There are even people that still believe in Spring. Whether or not they still believe in the tooth fairy is unclear.

But I think technology selection is another area where friction is important. When contemplating moving away from something well-known and well used in industry like Spring or Hibernate there is a lot of friction. There are new technologies to learn, new approaches to understand and new risks to manage. This all adds friction, so sometimes it’s easiest just to stick with what we know. Sometimes it really is the right choice – the technology you have expertise in is the one you’ll be most productive in immediately. But there are longer term questions too, which are much harder to answer: will the team eventually be more productive using technology X than technology Y?

Friction in software is a powerful process: we’re very lazy creatures, constantly trying to optimise. Anything that slows us down or gets in our way quickly gets side-stepped or worked around. We can use this knowledge as a tool to guide developer behaviour; but sometimes we need to be aware of how friction can change behaviours for the worse as well.

How many builds?

I’m always amazed at the seemingly high pain threshold .net developers have when it comes to tooling. I’ve written before about the poor state of tooling in .net, but just recently I hit another example of poor tooling that infuriates me: I have too many builds, and they don’t agree whether my code compiles.

One of the first things that struck me when starting to develop on .net was that compiling code was still a thing. An actual step that had to be thought about. Incremental compilers in Eclipse and the like have been around for ages – to the point where, generally, Java developers don’t actually have to instruct their IDE to compile code for them. But in Visual Studio? Oh it’s definitely necessary. And time consuming. Oh my god is it slow. Ok, maybe not as slow as Scala. But still unbelievably slow when you’re used to working at the speed of thought.

Another consequence of the closed, Microsoft way of doing things is that tools can’t share the compiler’s work. So ReSharper have implemented their own compiler, effectively. It incrementally parses source code and finds compiler errors. Sometimes it even agrees with the Visual Studio build. But all too often it doesn’t. From the spurious not-actually-an-error that I have to continually instruct ReSharper to ignore; to the warnings-as-errors build failures that ReSharper doesn’t warn me about; to the random why-does-ReSharper-not-know-about-that-NuGet-package-errors.

This can be infuriating when refactoring. E.g. if an automated refactor leaves a variable unused, I will now have a compiler warning; since all my projects run with warnings-as-errors switched on, this will fail the build. But ReSharper doesn’t know that. So I apply the refactoring, code & tests are green: commit. Push. Boom! CI is red. But it was an automated refactor for chrissakes, how’ve I broken the build?!

I also use NCrunch, an automated test runner for Visual Studio (like Infinitest in the Java world). NCrunch is awesome, by the way; better even than the continuous test runner in ReSharper 10. If you’ve never used a continuous test runner and think you’re doing TDD, sort your life out and setup Infinitest or NCrunch. It doesn’t just automate pressing the shortcut key to run all your tests. Well, actually that is exactly what it does – but the impact it has on your workflow is so much more than that. When you can type a few characters, look at the test output and see what happened you get instant feedback. This difference in degree changes the way you write code and makes it so much easier to do TDD.

Anyway I digress – NCrunch, because Microsoft, can’t use the result of the compile that Visual Studio does. So it does its own. It launches MSBuild in the background, continually re-compiling your code. This is not exactly kind on your CPU. It also introduces inconsistencies. Because NCrunch is running a slightly different MSBuild on each project to the build VisualStudio does you get subtly different results sometimes; which is different again from ReSharper with its own compiler that isn’t even using MSBuild. I now have three builds. Three compilers. It is honestly a miracle when they all agree that my code compiles.

An all-too-typical dev cycle becomes:

  • Write test
  • ReSharper is happy
  • NCrunch build is failing, force reload NCrunch project
  • NCrunch builds, test fails
  • Make test pass
  • Try to run app
  • VisualStudio build fails
  • Fix NuGet problems
  • NCrunch build is now failing
  • Force NCrunch to reload at least one project again
  • Force VisualStudio to rebuild the project
  • Then the solution
  • Run app to sanity check change
  • ReSharper now shows error
  • Re-ignore perennial ReSharper non-error
  • All three compilers agree, quick: commit!

Normally then the build fails in CI because I still screwed up the NuGet packages.

Then recently, as if this wasn’t already one of the outer circles of hell. The CI build was failing for a bizarre reason. We have a command line script which applies the same build steps that CI runs, so I thought I’d run that to replicate the problem. Unfortunately the command line build was failing on my machine for a spectacularly spurious reason that was different again than the failure in CI. Great. I now have five builds which don’t all agree on whether my code compiles.

Do you really hate computers? Do you wish you had more reasons to defenestrate every last one of them? Have you considered a career in software development?

Old Age Code

Is your code ready for retirement? Is it suffering from the diseases of old age? Do you have code you can’t even imagine retiring? It’s just too critical? Too pervasive? Too legacy?

Jon & The Widgets

Conveyor Belt - thanks to https://www.flickr.com/photos/qchristopher
Conveyor Belt – thanks to https://www.flickr.com/photos/qchristopher

Jon’s first job out of school was in the local widget factory, WidgetCo. Jon was young, enthusiastic and quickly took to the job of making widgets. The company was pleased with Jon and he grew in experience, learning more about making widgets, taking on ever more responsibility; until eventually Jon was responsible for all widget production.

After a couple of years one of WidgetCo’s biggest customers started asking about a new type of square widget. They had only made circular widgets before, but with Jon’s expertise they thought they could take on this new market. The design team got started, working closely with Jon to design a new type of square widget for Jon to produce. It was a great success, WidgetCo were the first to market with a square widget and sales went through the roof.

Unfortunately the new more complex widget production pipeline was putting a lot of pressure on the packing team. With different shape widgets and different options all needing to be sorted and packed properly mistakes were happening and orders being returned. Management needed a solution and turned to Jon. The team realised that if Jon knew a bit more about how orders were going to be packed he could organise the widget production line better to ensure the right number of each shape widget with the right options were being made at the right time. This made the job much easier for the packers and customer satisfaction jumped.

Sneeze - thanks to https://www.flickr.com/photos/foshydog
Sneeze – thanks to https://www.flickr.com/photos/foshydog

A few years down the line and Jon was now a key part of the company. He was involved in all stages of widget production from the design of new widgets and the tools to manufacture them through to the production and packaging. But one day Jon got sick. He came down with a virus and was out for a couple of days: the company stopped dead. Suddenly management realised how critical Jon was to their operations – without Jon they were literally stuck. Before Jon was even back up to speed management were already making plans for the future.

Shortly after, the sales team had a lead that needed a new hexagonal widget. Management knew this was a golden opportunity to try and remove some of their reliance on Jon. While Jon was involved in the initial design, the team commissioned a new production line and hired some new, inexperienced staff to run it. Unfortunately hexagonal widgets were vastly more complex than the square ones and, without Jon’s experience the new widget production line struggled. Quality was too variable, mistakes were being made and production was much too slow. The team were confident they could get better over time but management were unhappy. Meanwhile, Jon was still churning out his regular widgets, same as he always had.

But the packing team were in trouble again – with two, uncoordinated production lines at times they were inundated with widgets and at other times they were idle. Reluctantly, management agreed that the only solution was for Jon to take responsibility for coordinating both production lines. Their experiment to remove their reliance on Jon had failed.

The Deckhand - thanks to https://www.flickr.com/photos/neilmoralee
The Deckhand – thanks to https://www.flickr.com/photos/neilmoralee

A few years later still and the new production line had settled down; it never quite reached the fluidity of the original production line but sales were ok. But there was a new widget company offering a new type of octagonal widget. WidgetCo desperately needed to catch up. The design team worked with Jon but, with his workload coordinating two production lines, he was always busy – so the designers were always waiting on Jon for feedback. The truth was: Jon was getting old. His eyes weren’t what they used to be and the arthritis in his fingers made working the prototypes for the incredibly complex new widgets difficult. Delays piled up and management got ever more unhappy.

So what should management do?

Human vs Machine

When we cast a human in the central role in this story it sounds ridiculous. We can’t imagine a company being so reliant on one frail human being that it’s brought to its knees by an illness. But read the story as though Jon is a software system and suddenly it seems totally reasonable. Or if not reasonable, totally familiar. Throughout the software world we see vast edifices of legacy software, kept alive way past their best because nobody has the appetite to replace them. They become too ingrained, too critical: too big to fail.

14256058429_f7802658c8_zFor all it’s artificial construct, software is not so different from a living organism. A large code base will be of a level of complexity comparable with an organism – too complex for any single human being to comprehend in complete detail. There will be outcomes and responses that can’t be explained completely, that require detailed research to understand the pathway that leads from stimulus to response.

But yet we treat software like it is immortal. As though once written software will carry on working forever. But the reality is that within a few years software becomes less nimble, harder to change. With the growing weight of various changes of direction and focus software becomes slower and more bloated. Each generation piling on the pounds. Somehow no development team has mastered turning back time and turning a creaking, old age project into the glorious flush of youth where everything is possible and nothing takes any time at all.

It’s time to accept that software needs to be allowed to retire. Look around the code you maintain: what daren’t you retire? That’s where you should start. You should start planning to retire it soon, because if it’s bad now it is only getting worse. As the adage goes: the best time to start fixing this was five years ago; the second best time is now.

Sad dog - thanks to https://www.flickr.com/photos/ewwhite
Sad dog – thanks to https://www.flickr.com/photos/ewwhite

After five years all software is a bit creaky; not so quick to change as it once was. After ten years it’s well into legacy; standard approaches have moved on, tools improved, decade old software just feels dated to work with. After twenty years it really should be allowed to retire already.

Software ages badly, so you need to plan for it from the beginning. From day one start thinking about how you’re going to replace this system. A monolith will always be impossible to replace, so constantly think about breaking out separate components that could be retired independently. As soon as code has got bigger than you can throw away, it’s too late. Like a black hole a monolith sucks in functionality, as soon as you’re into run away growth your monolith will consume everything in it’s path. So keep things small and separate.

What about the legacy you have today? Just start. Somewhere. Anywhere. The code isn’t getting any younger.

Git stash driven development

I’ve found myself using a pattern quite often recently, which I’ve been calling “git stash driven development” – that is, relying heavily on the magic of git stash as part of my development workflow.

Normally I follow what I think of as a fairly typical TDD workflow:

  • Write next test, watch it fail
  • Write code to make it pass
  • Commit
  • Refactor
  • Commit
  • Push

This cycle can repeat very frequently – as often as every couple of minutes. Sometimes this cycle gets slowed down when the next test to write isn’t obvious or the refactoring needs more thought. But generally this is the process I try and follow.

Quite often having written the next test which takes me forwards on my feature I hit a problem: I can’t actually make the test pass (easily). First I need to refactor to make the problem easy. In that situation I can mark the test as ignored, commit and come back to it later. I refactor as required, commit, push; then finally unignore my test and get back to where I was before. This is a fairly neat process.

However there are a couple of times when this process doesn’t work: what if I’m part way through writing my test and I realise I can’t finish without refactoring the test infrastructure? I can’t ignore my test, it probably isn’t even compiling. I certainly don’t want to commit it in its current state. I could just bin my test and re-write it, if I’m following the 15 minute rule I’m not going to lose much work. But, with the magic of git stash, I can stash my changes and come back once I’ve refactored the test code.

The more annoying time this happens is when I’m part way through a refactor step. This happens more commonly when I’m really going through a design-change – this isn’t really refactoring as it often happens outside of the normal TDD loop. I’m trying to evolve the design to somewhere different; sometimes this is driven by tests, sometimes its a non-feature changing refactor. But often there are non-trivial changes happening across numerous source files. At this point it is very easy to get part way through a refactor and realise that something else needed to have happened first. I could bin my change, I only stand to lose 15 minutes work – but why throw it away when I have git stash?

So I git stash my changes, go and make the change I needed to have happened first. Then, all too commonly, I get part way through this second change and realise something else needs to happen first. Well, git stash again! This stack of git stashes can get quite deep, if you’re not careful. But once I’ve bottomed the stack out, once I’ve managed to commit a refactor that frees up the step above I can git stash pop, complete the next refactor, commit, git stash pop; and so on up the stack until I’m done.

Now, arguably, I’m discovering the refactor in reverse order, but this seems to me often how I find it. I could have spent more time analysing the change in detail, of course. Spent time planning out my change on paper before embarking on it in the correct order. However, this is always time consuming and there’s still the risk that I miss something and come at a change “backwards”. I find that using git stash in this way lets me discover the refactor that I need to make one step at a time. Each commit is kept small, I try and stick to the 15 minute rule so that no single commit loses more than 15 minutes. Ultimately the design change is completed in a sequence of small commits, each of which builds logically on the one before. They’ve been discovered by exploration, the commits were just discovered in reverse order.

The danger is always that I find a refactor step I can’t complete the way I’d imagined – now I can’t unwind the stack and potentially all the previous git stashes aren’t committable. Whenever this happens I normally find going one or two levels up the stack will present a different approach, from where I can continue as before.