Risk free software

[tweetmeme source=”activelylazy” only_single=false]

Nobody wants to make mistakes, do they? If you can see something’s gonna go wrong, its only natural to do what you can to prevent it. If you’ve made a mistake once, what kind of idiot wants to repeat it? But what if the cure is worse than the problem? What if the effort of avoiding mistakes is worse than what you’re preventing?

Preventative Measures

So you’ve found a bug in production that really should have been caught during QA; you’ve had a load-related outage in production; you find a security issue in production. What’s the natural thing to do? Once you’ve fixed the immediate problem, you probably put in place a process to stop similar mistakes next time.

Five whys is a great technique to understand the causes and make appropriate changes. But if you find yourself adding more bureaucracy, a sign-off to prevent this happening in future – you’re probably doing it wrong!

Unfortunately this is a natural instinct: in response to finding bugs in production, you introduce a sign-off to confirm that everyone is happy the product is bug-free, whatever that might mean; you introduce a final performance test phase, with a sign-off to confirm production won’t crash under load; you introduce a final security test, with a sign-off to confirm production is secure.

Each step and each reaction to a problem is perfectly logical; each answer is clear, simple and wrong.

Risk Free Software

Let’s be clear: there’s no such thing as risk free software. You can’t do anything without taking some risk. But what’s easy to overlook, is that not doing something is a risk, too.

Not fixing a bug runs the risk that its more serious than you thought; more prevalent than you thought; that it could happen to an important customer, someone in the press, or a highly valued customer – with real revenue risk. You run the risk that it collides with another, as yet unknown bug, potentially multiplying the pain.

Sometimes not releasing feels like the safest thing to do – but you’re releasing software because you know something is wrong. How can not changing it ever be better?

The Alternative

So what you gonna do? No business wants to accept risk, you have to mitigate it somehow. The simple, easy and wrong thing to do is to add more process. The braver decision, the right decision, is to make it easy to undo any mistakes.

Any release process, no matter how retarded, will normally have some kind of rollback. Some way of getting back to how things used to be. At its simplest, this is a way of mitigating the risk of making a mistake: if it really is a pretty shit release, you can roll it back. Its not great, but it gives you a way of recovering when the inevitable happens.

But often people want to avoid this kind of scenario. People want to avoid rolling back; to avoid the risk of a roll back; totally missing the point that the rollback is your way of managing risk. Instead, you’re forced to mitigate the risk up front with bureaucracy.

If you’re using rollback as a way of managing risk (and why wouldn’t you?), then you’d expect to rollback from time to time. If you’re not rolling back, then you’re clearly removing all risk earlier in the process. This means you have a great process for removing risk; but could you have less process and still release product sometime this year?

Get There Quicker

Being able to rollback is about being able to recover from mistakes quickly and reliably. Another way to do that is to just release solutions quickly. Instead of rolling back and scheduling a fix sometime later, why not get the fix coded, tested and deployed as quickly as possible?

Some companies rely on being able to release quickly and easily every day. Continuous deployment might not itself improve quality; but it improves your ability to react to problems. The obvious side-effect of this is that you can fix issues much faster, so you don’t spend time before a release trying to catch absolutely everything. Instead by decreasing the time between revisions, by increasing your velocity, you create a higher quality product: you just fix issues so much faster.

Continuous deployment lets you streamline your process – you don’t need quite so many checks and balances, because if something bad happens you can react to it and fix it. Obviously, you need tests to ensure your builds are sound – but it encourages you to automate your checks, rather than relying on humans and manual sign-offs. Instead of introducing process, why not write code to check you’ve not made the same mistake twice?

Of course, the real irony in all this, is that the thing that often stops you doing continuous deployment is a long and tortuous release process. The release process encapsulates the lessons from all your previous mistakes. But with a lightweight process, you could react so much faster, by patching within minutes not days, that you wouldn’t need the tortuous process.

Your process has become its own enemy!

Doing agile the traditional way

[tweetmeme source=”activelylazy” only_single=false]

Doing agile is easy. If you’re working on a greenfield project, with no history and no existing standards and processes to follow. The rest of us get to work on brownfield projects, with company standards that are the antithesis of agile, and people and processes wedded to a past long out of date. Oh you can do agile in this environment, but its hard – because you have to overcome the constraints that meant you weren’t agile in the first place.

Any organisation hoping to “go agile” needs to overcome its constraints. The scrum team hits issues stopping them being properly agile – normally artefacts of the old way of doing things – they raise these to the scrum master and management hoping to remove them and reach the agile nirvana on the other side. Management diligently discuss these constraints, the easiest are quickly removed – simple things like better tools, whiteboards everywhere; but some take a bit longer to remove.

Before too long, you run into company culture: these are the constraints that just won’t go away. Not all constraints are created equal, though – the hardest to remove are often the most vital; the things that stop the company being properly agile. So let’s look at some of the activities and typical constraints a team can encounter.

User Story Workshops

If you’re doing agile, you’ve gotta have user stories. If you have user stories, you’re bound to have something like a user story workshop – where all the stakeholders (or just the dev team, if you’re kidding yourself) get together to agree the basic details of the work that needs to be done.

The easiest trap in the world to fall into is to try and perfect the user stories. Before you know it you’re stuck in analysis paralysis, reviewing and re-reviewing the user stories until everyone is totally happy with them. Every time you discuss them, someone thinks of a new edge case, a new detail to think about – so you add more acceptance criteria, more user stories, more spikes.

Eventually you’ll get moving again, but now you’ve generated a mountain of collateral with the illusion of accuracy. User stories should be a place holder for a discussion, if you waste time generating that detail up front you might skip the conversation later thinking you’ve captured everything already and miss the really critical details.


The worst thing is when it comes to producing estimates. The start of a new project or a team is a difficult time: you’ve got no history to base your estimates on, but management need to know how long you’ll take. With the best of intentions, scrum masters and team leads are often given incentives – a bonus, for example – to produce accurate estimates; there’s only two ways to play this: 1. pad estimates mercilessly and know you’ll fill that “spare” time easily; 2. spend more time analysing the problem, as though exhaustive analysis can predict the future. Neither of these outcomes are what the business really wants and certainly can’t be described as “agile”.


There’s a great conflict between the need for overall architecture and design; and the agile mentality to JFDI – to not spend time producing artefacts that don’t themselves add value. However, you can’t coordinate large development activities without some guiding architecture, some vague notion of where you’re heading.

But the logical conclusion of this train of thought is that you need some architectural design – so lets write a document describing our ideas and get together to discuss it. Luckily this provides a great forum for all the various stakeholders across the business that don’t grok code to have their say: “we need an EDA“; “I know yours is a small change, but first you must solve some arbitrarily vast problem overnight”; “I’m a potential user of this app and I think it should be green“.

This process is great at producing perfect design documents. Unfortunately in the real world, where requirements change faster than designs can be written, its a completely wasted effort. But it gives everyone a forum for airing their views and since they don’t trust the development team to meet their requirements any other way, any change to the process is resisted.

The scrum team naturally raise the design review process as a constraint, but once its clear it can’t be removed, they start to adapt to it. Because the design can fundamentally change on the basis of one person’s view right up to the last minute (hey! Cassandra sounds cool, let’s use that!); the dev team change their process: “sprints can’t start until the design is signed off”. And because the design can fundamentally change the amount of work to be done, the user stories are never “final” until the design is signed off and the business can’t get an estimate until the design has been agreed.


If you’re lucky, the one part of the project that feels vaguely scrum-y is the development sprint. The team are relatively free of dependencies on others; they can self-organise around getting the work done; things are tested as they go; the Definition of Done is completed before the next story is started. The team are rightly proud of “being agile”.

Manual Testing

And then the constraints emerge again. If you’re working on a legacy code base, without a decent suite of automated tests, you probably rely on manual testers clicking through your product to some written test script. A grotesque misuse of human beings if ever there was one, but the cost of investing in automation “this time” is outweighed by how quickly the testers can whip through a script now they’ve done it a million times.

Then worst of all possible worlds, because regression testing the application isn’t a case of clicking a button (but clicking thousands of buttons, one by one in a very specific order) you have to have a “final QA run”. A chance, once development has stopped, for QA to assure the product meets the required quality goals. If your reliance on manual QA is large, this can be a time-consuming exercise. And then, what happens if QA find a bug? Its fixed, and you start the whole process all over again… like some gruesome nightmare you can’t wake up from!

The Result

With these constraints at every step of the process, we’re not working how we’d like to. Now either we can view them as a hindrances to working the way we want to; or we can view them as forcing a new process on us, that we’re not in control of. What we’ve managed to do is invent a new development methodology:

  • First we carefully gather requirements and discuss ad nauseum
  • Then we carefully design a solution that meets these (annoyingly changing) requirements
  • Then we write code to meet the design (and ignore that the design changes along the way)
  • Finally we test that the code we wrote actually does what we wanted it to (even though that’s changed since we started)

What have we invented? Waterfall. We’ve rediscovered waterfall. Well done! For all your effort and initiatives, you’ve managed to invent a software development methodology discredited decades ago.

But we’ve tarted it up with stand ups, user stories, a burn down chart and some poor schmuck that gets called the scum master – but really, all we’ve done is put some window dressing on our old process and given it a fancy dan name.

If the ultimate goal of introducing agile to your company is to make the business more efficient but you’re still drowning in constraints – then that means the goal of introducing agile has failed: you’ve still got the same constraints, and you’re still doing waterfall – no matter what you call it: a rose by any other name still smells of shit.