Longevity of Source Code

Take a look at the code you work in day-to-day. How long has it been there? How old is it? Six months old? A year? Maybe five years old? Ten? Twenty?! How much of the code is old? Less than 10%? Half? Or as much as 90%? Curious to know the answers to these questions I’ve been investigating how long code sticks around.

Software archeology

Work for any company that’s been around for more than a couple of years and there will be source code that’s been there for a while. Writing software in an environment like this is often an exercise in software archeology – digging down into the application is like digging down into an old city, slowly uncovering the past layer by layer.

Once you get past the shiny new containerized micro-services, you start delving into the recent past: perhaps the remnants of the company’s first foray into a Service Oriented Architecture; now a collection of monolithic services with a complex tangle of business logic, all tied together with so much Spring. Dig deeper still and we get back to the EJB era; some long-forgotten beans still clinging on to life, for lack of any developer’s appetite to understand them again. Down here is where the skeletons are.

If it ain’t broke, don’t fix it

What’s so bad about old code? It’s fulfilling some important purpose, no doubt. At least, some of it probably is.

If you look at code you wrote a year ago and can’t see anything to change, you haven’t learnt a thing in the last year

We’re always learning more: a better understanding of the domain, a better understanding of how our solution models the domain, new architectural styles, new tools, new approaches, new standards and new ideas. It is inevitable the code you wrote a year ago could be improved somehow. But how much of it have you gone back and improved recently?

The trouble with old code is that it gets increasingly hard to change. What would happen if a change in business requirements led you all the way down to the Roman-era sewers that are the EJBs? Would you implement the change the way it would have been done a decade ago? Or would you spend time trying to extract the parts that need to change? Perhaps building another shiny, new containerized micro-service along the way? That change isn’t going to be cheap though.

And this is the problem: paying back this “technical debt” is the right thing to do, but it will be slower to change this ancient code than the stuff you wrote last week or last month. The more ancient code you have the slower it will be to make changes, the slower you can develop new features. The worst part of maintaining a long running code base isn’t just paying back the debt from the things we know we did wrong; it’s the debt from things that were done right (at the time), but only now seem wrong.

How old is our code?

I’ve been looking through a variety of source code: some commercial, some open source. Across a variety of languages (Java, C#, Ruby). Generally it seems that most code bases follow a pretty similar pattern:

About 70% of the lines of code you wrote today will still be in head, unchanged, in 12 months time

Perhaps unsurprisingly, code changes most often in the first couple of months after being written. After that, it seems to enter a maintenance mode, where code changes relatively rarely.

image

I found this pretty surprising: after a year around 75% of the code I’ve written is still there. Imagine how much better I understand the problem today. Imagine the since-forgotten design ideas, the changed architectural vision, the new tools and libraries it could be refactored to use today. Imagine how much better every single one of those lines could be. And yet, even in a code base where we’re constantly working to pay back technical debt, we’re barely making a dent in how old the code is.

The how

How did I do this analysis? Thanks to the magic power of git, it is surprisingly easy. I can use git to do a recursive git blame on the entire repo. This lists, for each line currently in head, the commit that introduced it, who made it and when. With a bit of shell-fu we can extract a count by day or month:

git ls-tree -r -rz --name-only HEAD -- | xargs -0 -n1 git blame -f HEAD | sed -e 's/^.* \([0-9]\{4\}-[0-9]\{2\}\)-[0-9]\{2\} .*$/\1/g' | sort | uniq -c

This outputs a nice table of lines of code last touched by month, as of today. But, I can just as easily go back in history, e.g. to go back to the start of 2015:

git checkout `git rev-list -n 1 --before="2015-01-01 00:00:00" master`

I can then re-run the recursive git blame. By comparing the count of lines last checked in each month I can see how much of the code written before 2015 still exists today. With more detailed analysis I can see how the number of lines last touched in a given month changes over time to see how quickly (or not!) code decays.

Conclusion

Code is written to serve some purpose, to deliver some business value. But it quickly becomes a liability. As this code ages and rots it becomes harder and harder to change. From the analysis above it is clear to understand why a code base that has been around for a decade is mostly all archaic: with so little of the old code being changed each year it just continues to hang around, all the while we’re piling up new legacy on top – most of which will still be here next year.

What are we to do about it? It seems to be a natural law of software that over time it only accumulates. From my own experience it seems that even concerted efforts on refactoring make little impact. What are we to do? Do we just accept that changes become progressively harder as source code ages? Or do we need to find a way to encourage software to have a shorter half-life, to be re-written sooner.

Old Age Code

Is your code ready for retirement? Is it suffering from the diseases of old age? Do you have code you can’t even imagine retiring? It’s just too critical? Too pervasive? Too legacy?

Jon & The Widgets

Conveyor Belt - thanks to https://www.flickr.com/photos/qchristopher
Conveyor Belt – thanks to https://www.flickr.com/photos/qchristopher

Jon’s first job out of school was in the local widget factory, WidgetCo. Jon was young, enthusiastic and quickly took to the job of making widgets. The company was pleased with Jon and he grew in experience, learning more about making widgets, taking on ever more responsibility; until eventually Jon was responsible for all widget production.

After a couple of years one of WidgetCo’s biggest customers started asking about a new type of square widget. They had only made circular widgets before, but with Jon’s expertise they thought they could take on this new market. The design team got started, working closely with Jon to design a new type of square widget for Jon to produce. It was a great success, WidgetCo were the first to market with a square widget and sales went through the roof.

Unfortunately the new more complex widget production pipeline was putting a lot of pressure on the packing team. With different shape widgets and different options all needing to be sorted and packed properly mistakes were happening and orders being returned. Management needed a solution and turned to Jon. The team realised that if Jon knew a bit more about how orders were going to be packed he could organise the widget production line better to ensure the right number of each shape widget with the right options were being made at the right time. This made the job much easier for the packers and customer satisfaction jumped.

Sneeze - thanks to https://www.flickr.com/photos/foshydog
Sneeze – thanks to https://www.flickr.com/photos/foshydog

A few years down the line and Jon was now a key part of the company. He was involved in all stages of widget production from the design of new widgets and the tools to manufacture them through to the production and packaging. But one day Jon got sick. He came down with a virus and was out for a couple of days: the company stopped dead. Suddenly management realised how critical Jon was to their operations – without Jon they were literally stuck. Before Jon was even back up to speed management were already making plans for the future.

Shortly after, the sales team had a lead that needed a new hexagonal widget. Management knew this was a golden opportunity to try and remove some of their reliance on Jon. While Jon was involved in the initial design, the team commissioned a new production line and hired some new, inexperienced staff to run it. Unfortunately hexagonal widgets were vastly more complex than the square ones and, without Jon’s experience the new widget production line struggled. Quality was too variable, mistakes were being made and production was much too slow. The team were confident they could get better over time but management were unhappy. Meanwhile, Jon was still churning out his regular widgets, same as he always had.

But the packing team were in trouble again – with two, uncoordinated production lines at times they were inundated with widgets and at other times they were idle. Reluctantly, management agreed that the only solution was for Jon to take responsibility for coordinating both production lines. Their experiment to remove their reliance on Jon had failed.

The Deckhand - thanks to https://www.flickr.com/photos/neilmoralee
The Deckhand – thanks to https://www.flickr.com/photos/neilmoralee

A few years later still and the new production line had settled down; it never quite reached the fluidity of the original production line but sales were ok. But there was a new widget company offering a new type of octagonal widget. WidgetCo desperately needed to catch up. The design team worked with Jon but, with his workload coordinating two production lines, he was always busy – so the designers were always waiting on Jon for feedback. The truth was: Jon was getting old. His eyes weren’t what they used to be and the arthritis in his fingers made working the prototypes for the incredibly complex new widgets difficult. Delays piled up and management got ever more unhappy.

So what should management do?

Human vs Machine

When we cast a human in the central role in this story it sounds ridiculous. We can’t imagine a company being so reliant on one frail human being that it’s brought to its knees by an illness. But read the story as though Jon is a software system and suddenly it seems totally reasonable. Or if not reasonable, totally familiar. Throughout the software world we see vast edifices of legacy software, kept alive way past their best because nobody has the appetite to replace them. They become too ingrained, too critical: too big to fail.

14256058429_f7802658c8_zFor all it’s artificial construct, software is not so different from a living organism. A large code base will be of a level of complexity comparable with an organism – too complex for any single human being to comprehend in complete detail. There will be outcomes and responses that can’t be explained completely, that require detailed research to understand the pathway that leads from stimulus to response.

But yet we treat software like it is immortal. As though once written software will carry on working forever. But the reality is that within a few years software becomes less nimble, harder to change. With the growing weight of various changes of direction and focus software becomes slower and more bloated. Each generation piling on the pounds. Somehow no development team has mastered turning back time and turning a creaking, old age project into the glorious flush of youth where everything is possible and nothing takes any time at all.

It’s time to accept that software needs to be allowed to retire. Look around the code you maintain: what daren’t you retire? That’s where you should start. You should start planning to retire it soon, because if it’s bad now it is only getting worse. As the adage goes: the best time to start fixing this was five years ago; the second best time is now.

Sad dog - thanks to https://www.flickr.com/photos/ewwhite
Sad dog – thanks to https://www.flickr.com/photos/ewwhite

After five years all software is a bit creaky; not so quick to change as it once was. After ten years it’s well into legacy; standard approaches have moved on, tools improved, decade old software just feels dated to work with. After twenty years it really should be allowed to retire already.

Software ages badly, so you need to plan for it from the beginning. From day one start thinking about how you’re going to replace this system. A monolith will always be impossible to replace, so constantly think about breaking out separate components that could be retired independently. As soon as code has got bigger than you can throw away, it’s too late. Like a black hole a monolith sucks in functionality, as soon as you’re into run away growth your monolith will consume everything in it’s path. So keep things small and separate.

What about the legacy you have today? Just start. Somewhere. Anywhere. The code isn’t getting any younger.

Tooling in the .net World

As a refugee Java developer first washing up on the shores of .net land, I was pretty surprised at the poor state of the tools and libraries that seemed to be in common use. I couldn’t believe this shanty town was the state of the art; but with a spring in my step I marched inland. A few years down the road, I can honestly say it really isn’t that great – there are some high points, but there are a lot of low points. I hadn’t realised at the time but as a Java developer I’d been incredibly spoiled by a vivacious open source community producing loads of great software. In the .net world there seem to be two ways of solving any problem:

  1. The Microsoft way
  2. The other way*

*Note: may not work, probably costs money.

IDE

The obvious place to start is the IDE. I’m sorry, but Visual Studio just isn’t good enough. Compared to Eclipse and IntelliJ for day-to-day development Visual Studio is a poor imitation. Worse, until recently, it was expensive as a lone developer. While there were community editions of Visual Studio, these were even more pared down than the unusable professional edition. Frankly, if you can’t run ReSharper, it doesn’t count as an IDE.

I hear there are actually people out there who use Visual Studio without ReSharper. What are these people? Sadists? What do they do? I hope it isn’t writing software. Perhaps this explains the poor state of so much software; with tools this bad – is it any wonder?

But finally Microsoft have seen sense and recently the free version of Visual Studio supports plugins – which means you can use ReSharper. You still have to pay for it, but with their recent licensing changes I don’t feel quite so bad handing over a few quid a month renting ReSharper before I commit to buying an outright license. Obviously the situation for companies is different – it’s much easier for a company to justify spending the money on Visual Studio licenses and ReSharper licenses.

With ReSharper Visual Studio is at least closer to Eclipse or IntelliJ. It still falls short, but there is clearly only so much lipstick that JetBrains can put on that particular pig. I do however have great hope for Project Rider, basically IntelliJ for .net.

Restful Services

A few years back another Java refugee and I started trying to write a RESTful, CQRS-style web service in C#. We’d done the same in a previous company on the Java stack and expected our choices to be similarly varied. But instead of a vast plethora of approaches from basic HTTP listeners to servlet containers to full blown app servers we narrowed the field down to two choices:

  1. Microsoft’s WCF
  2. ServiceStack

My fellow developer played with WCF and decided he couldn’t make it easily fit the RESTful, CQRS style he had in mind. After playing with ServiceStack we found it could. But then begins a long, tortuous process of finding all the things ServiceStack hadn’t quite got right; all the things we didn’t quite agree with. Now, this is not entirely uncommon in any technology selection process. But we’d already quickly narrowed the field to two. We were now committed to a technology that at every turn was causing us new, unanticipated problems (most annoyingly, problems that other dev teams weren’t having with vanilla WCF services!)

We joked (not really joking) that it would be simpler to write our own service layer on top of the basic HTTP support in .net. In hindsight, it probably would have been. But really, we’d paid the price for having the temerity to step off the One True Microsoft Way.

Worse, what had started as an open source project went paid for during the development of our service – which meant that our brand new web service was immediately legacy as the service layer it was built on was no longer officially supported.

Dependency Management

The thing I found most surprising on arrival in .net land was that people were checking all their third party binaries into version control like it was the 1990s! Java developers might complain that Maven is nothing more than a DSL for downloading the internet. But you know what, it’s bloody good at downloading the internet. Instead, I had the internet checked in to version control.

However, salvation was at hand with NuGet. Except, NuGet really is a bit crap. NuGet doesn’t so much manage my dependencies as break them. All the damned time. Should we restrict all versions of a library (say log4net) to one version across the solution? Nah, let’s have a few variations. Oh, but nuget, now I get random runtime errors because of method signature mis-matches. But it doesn’t make the build fail? Brilliant, thank you, nuget.

So at my current place we’ve written a pre-build script to check that nuget hasn’t screwed the dependencies up. This pre-build check fails more often than I would like to believe.

So managing a coherent set of dependencies isn’t the job of the dependency tool, so what does it do? It downloads one file at a time from the internet. Well done. I think wget can do that, too. It’s a damned sight faster than the nuget power shell console, too. Nuget: it might break your builds, but at least it’s slow.

The Microsoft Way

And then I find things that blow me away. At my current place we’ve written some add-ins to Excel so that people who live and die by Excel can interact with our software. This is pretty cool: adding buttons and menus and ribbons into Excel, all integrated to my back-end services.

In my life as a Java developer I can never even imagine attempting this. The hoops to jump through would have been far too numerous. But in Visual Studio? Create a new, specific type of solution, hit F5, Excel opens. Set a breakpoint, Excel stops. Oh my God – this is an integrated development environment.

Obviously our data is all stored in Microsoft SQLServer. This also has brilliant integration with Visual Studio. For example, we experimented with creating a .net assembly to read some of the binary data we’re storing in the DB. This way we could run efficient queries on complex data types directly in the DB. The dev process for this? Similarly awesome: deploy to the DB and run directly from within Visual Studio. Holy integrated dev cycle, batman!

When there is a Microsoft way, this is why it is so compelling. Whatever they do will be brilliantly integrated with everything else they do. It might not have the flexibility you want, it might not have all the features you want. But it will be spectacularly well integrated with everything else you’re already using.

Why?

Why does it have to be this way? C# is really awesome language; well designed with a lot of expressive power. But the open source ecosystem around it is barren. If the Microsoft Way doesn’t fit the bill, you are often completely stuck.

I think it comes down to the history of Visual Studio being so expensive. Even as a C# developer by day, I am not spending the thick end of £1,000 to get a matching dev environment at home, so I can play. Even if I had a solid idea of an open source project to start, I’m not going to invest a thousand quid just to see.

But finally Microsoft seem to be catching on to the open source thing. Free versions of Visual Studio and projects like CoreCLR can only help. But the Java ecosystem has a decade head start and this ultimately creates a network effect: it’s hard to write good open source software for .net because there’s so little good open source tooling for .net.