Measuring Code

How good is your code? If you’re like the other 80% of above average developers, then I bet your code is pretty awesome. But are you sure? How can you tell? Or perhaps you’re working on a legacy code base – just how bad is the code? And is it getting better? Code metrics provide a way of measuring your code – some people love ’em, but some hate ’em.

The Good

Personally I’ve always found metrics very useful. For example – code coverage tools like Emma can give you a great insight into where you do and don’t have test coverage. Before embarking on an epic refactor of a particular package, just how much coverage is there? Maybe I should increase the test coverage before I start tearing the code apart.

Another interesting metric can be lines of code. While working in a legacy code base (and who isn’t?), if you can keep velocity consistent (so you’re still delivering features) but keep the volume of inventory the same or less, then you’re making the code less crappy while still delivering value. Any idiot can implement a feature by writing bucket loads of new code, but it takes real craftsmanship to deliver new features and reduce the size of the code base.

The Bad

The problem with any metric is who consumes it. The last thing you want is for an over eager manager to start monitoring it.

You can’t control what you can’t measure
— Tom DeMarco

Before you know it, there’s a bonus attached to the number of defects raised. Or there’s a code coverage target everyone is “encouraged” to meet.

As soon as there’s management pressure on a metric, smart people will game the system. I’ve lost count of the number of times I’ve seen people gaming code coverage metrics. In an effort to please a well meaning but fundamentally misguided manager, developers end up writing tests with no assertions. Sure, the code ran and didn’t blow up. But did it do the right thing? Who knows! And if you introduce bugs, will your tests catch it? Hell, no! So your coverage is useless.

The target was met but the underlying goal – improving software quality – has not only been missed, it’s now harder to meet in future.

The Ugly

The goal of any metric is to measure something useful about the code base. Take code coverage, for example – really what we’re interested in is defect coverage. That is, out of the universe of all possible defects in our code, how many would cause a failure in at least one test? That’s what we want to know – how protected are we against regressions in the code base.

The trouble is, how can I measure “the universe of all possible defects” in a system? Its basically unknowable. Instead, we use code coverage as an approximation. Given that tests assert the code did the right thing, the percentage of code that has been executed is a good estimation of the likelihood of bugs being caught by them. If my tests execute 50% of the code, at best I can catch bugs in 50% of the code. If there are bugs in the other 50%, there’s zero chance my tests will find them. Code coverage is an upper bound on test coverage. But, if your tests are shoddy, test coverage can be much lower. To the point where tests with no assertions are basically useless.

And this is the difficulty with metrics: measuring what really matters – the quality of our software – is hard, if not impossible. So instead we have to measure what we can, but it isn’t always clear how that relates to our underlying goal.

But what does it mean?

There are some excellent tools out there like Sonar that give you a great overview of your code using a variety of common metrics. The trouble often is that developers don’t know (or care) what they mean. Is a complexity of 17.0 / class good or bad? I’m 5.6% tangled – but maybe there’s a good reason for that. What’s a reasonable target for this code base? And is LCOM4 a good thing or a bad thing? It sounds like a cancer treatment, to be honest.

Sure, if I’m motivated enough I can dig in and figure out what each metric means and we can try and agree reasonable targets and blah blah blah. C’mon, I’m busy delivering business value. I don’t have time for that crap. It’s all just too subtle so it gets ignored. Except by management.

A Better Way

Surely there’s got to be a better way to measure “code quality”?

1. Agree

Whatever you measure, its important the team agree and understand what it means. If there’s a measure half the team don’t agree with, then its unlikely it will get better. Some people will work towards improving it, others won’t so will let it get worse. The net effect is likely to be heartache and grief all round.

2. Measure What’s Important

Downward chartYou don’t have to measure the “standard” things – like code coverage or cyclomatic complexity. As long as the team agree its a useful thing to measure, everyone agrees it needs improving and can commit to improving it – then its a useful measure.

A colleague of mine at youDevise spent his 10% time building a tool to track and graph various measures of our code base. But, rather unusually, these weren’t the usual metrics that the big static analysis tools gather – these were much more tightly focused, much more specific to the issues we face. So what kind of things can you measure easily yourself?

  • If you have a god class, why not count the number of lines in the file? Less is better.
  • If you have a 3rd party library you’re trying to get rid of, why not count the number of references to it.
  • If you have a class you’re trying to eliminate, why not count the number of times its imported?

These simple measures represent real technical debt we want to remove – by removing technical debt we will be improving the quality of our code base. They can also be incredibly easy to gather, the most naive approach only needs grep & wc.

It doesn’t matter what you measure, as long as the team believe whatever you do measure should be improved; then it gives you an insight into the quality of your code base, using a measure you care about.

3. Make It Visible

Finally, put this on a screen somewhere – next to your build status is good. That way everyone can see how you’re doing and gets a constant reminder that quality is important. This feedback is vital – you can see when things are getting better and, just as importantly, when things start to slip and the graph veers ominously upwards.

Keep It Simple, Stupid

Code quality is such an abstract concept its impossible to measure. Instead, focus on specific things you can measure easily. The simpler the metric is to understand the easier it is to improve. If you have to explain what a metric means you’re doing it wrong. Try and focus on just a few things at any one time – if you’re tracking 100 different metrics its going to be sheer luck that on average they’re all getting better. If we instead focus on half a dozen, I can remember them – the very least I’ll do is not let them get worse; and if I can, they’ll be clear in my mind so I can improve them.

Do you use metrics? If so, what do you measure? If not, do you think there’s something you could measure?


If I had more time I would have written less code

In a a blatant rip-off of the T.S Eliot quote: “if I had more time, I would have written a shorter letter” I had a thought the other day, perhaps:

If I had more time, I would have written less code

It seems to me the more time time I spend on a problem, the less code I usually end up with; I’ll refactor and usually find a more elegant design – resulting in less code and a better solution. This is at the heart of the programmer’s instinct that good code takes longer to write than crap. But as Uncle Bob tells us:

The only way to go fast is to go well

How then do we square this contradiction? The way to go fast is to go well; but if I sacrifice quality I can go faster – so I’m able to rush crap out quickly.

What makes us rush?

First off – why do developers try to rush through work? It’s the same old story that I’m sure we’ve all heard a million times:

  • There’s an immovable deadline: advertising has already been bought; or some external event – could be Y2K or a big sports event
  • We need to recognise the revenue for this change in this quarter so we make our numbers
  • Somebody, somewhere has promised this and doesn’t want to lose face by admitting they were wrong – so the dev team get to bend so those who over committed don’t look foolish

What happens when we rush?

The most common form of rushing is to skip refactoring. The diabolical manager looks at the Red, Green, Refactor cycle and can see a 33% improvement if those pesky developers will just stop their meddlesome refactoring.

Of course, it doesn’t take a diabolical manager – developers will sometimes skip refactoring in the name of getting something functional out the door quicker.

It’s ok though, we’ll come back and fix this in phase 2

How many times have we heard this shit? Why do we still believe it? Seriously, I think my epitaph should be “Now working on phase 2”.

Another common mistake is to dive straight in to writing code. When you’re growing your software guided by tests, this probably isn’t the end of the world as a good design should emerge. But sometimes 5 minutes thought before you dive in can save you hours of headache later.

And finally, the most egregious rush I’ve ever seen is attempting to start development from “high level requirements”. As the “inconsequential little details” are added in, the requirements look completely different. And all that work spent getting a “head start” is now completely wasted. You now either bin it and do-over, or waste more precious time re-working something you should never have written in the first place.

Does rushing work?

This is the heart of the contradiction – it feels like it does. That’s the reason we do it, isn’t it? When compelled by the customer to finish quicker, we rush – we cut corners to try and get things done quicker.

It feels like I’m going quicker when I cut corners. If only it wasn’t for all those unexpected delays that wasted my time then I would have been finished quicker. Its not my fault those delays cropped up – I couldn’t have predicted that.

It’s the product manager’s fault for not making the requirements clear

It’s the architect’s fault for not telling me about the new guidelines

Its just unlucky that a couple of bugs QA found were really hard to fix

When you cut corners, you pay for it later. Time not spent discussing the requirements in detail leads to misunderstandings; if you don’t spot it later on, you might have to wait until the show and tell for the customer to spot it – now you’ve wasted loads of time. Time not spent thinking about the design can lead to you going down an architectural blind alley that causes loads of rework. Finally time not spent refactoring leaves you with a pile of crap that will be harder to change when you start work on the next story, or even have to make changes for this one. Of course, you couldn’t have seen these problems coming – you did your best, these things crop up in software don’t they?

But what if you hadn’t rushed? What if rather than diving in, you’d spent another 15 minutes discussing the requirements in detail with the customer? You might have realised earlier that you had it all wrong and completely misunderstood what she wanted. What if you spent 30 minutes round a whiteboard with a colleague discussing the design? Maybe he would have pointed out the flaws in your ideas before you started coding. Finally, what if you’d spent a bit more time refactoring? That spaghetti mess you’re going to swear about next week and spend days trying to unravel will still be fresh in your mind and easier to untangle. For the sake of an hour’s work you could save yourself days.

How much is enough?

Of course, it’s all very well to say we will not write crap any more; but it’s not a binary distinction, is it? There’s a sliding scale from highly polished only-really-suitable-for-academics perfection; to sheer, unmitigated, what-were-you-thinking-you-brain-dead-fuck crapitude.

If spending more time discussing requirements could save time later, why not spend years going through a detailed requirements gathering exercise? If spending more time thinking before coding could save time later, why not design the whole thing down to the finest detail before any coding is done? Finally if refactoring mercilessly will save time, why not spend time refactoring everything to be as simple as possible? Actually, wait, this last one probably is a good idea.

Professional software development is always a balancing act. It’s a judgement call to know how much time is enough.

How long does it take?

When working through a development task I choose to expend a certain amount extra effort over and above how long it takes me to type in the code and test it; this extra time is spent discussing requirements, arguing about the design, refactoring the code etc etc. Ideally I want to expend just enough effort to get the job done as quickly as possible (the lazy and impatient developer). If I expend too little effort I risk being delayed later by unnecessary rework; if I expend too much effort I’ve wasted my time.

However, I don’t know in advance what that optimum amount of effort is – it’s a guess based on experience. But I expect the amount of effort I put in to be within some range of the optimum – the lowest the point on the graph:

All other things being equal, I’d expect the amount of time it actually takes to complete the task to be within some margin based on how close I got to the optimum amount of effort. Sometimes I do too little and waste time with rework. Other times I spend too long – e.g. too much detail in the design that didn’t add any value – so the extra time I spent is wasted.

This is by no means all the uncertainty in an estimate; but this is the difference between me not doing quite enough and having to pay for it later (refactoring, debugging or plain rework); versus doing just enough at every stage so that no effort is wasted and there’s no rework I could have avoided more cheaply. In reality the time taken is likely to be somewhere in the middle: not to great but not too shabby.

There’s an interesting exercise here to consider the impact experience has on this. When I was first starting out I’d jump straight into coding; my effort range would be to the left of the graph: I’d spend most of my time re-writing what I’d done so the actual time taken would be huge. Similarly, as I became more experienced I’d spend ages clarifying requirements, writing detailed designs and refactoring until the code was perfect; now my effort range would be to the right of the graph – I’d expend considerable upfront effort, much of which was unnecessary. Now, I’d like to think, I’m a bit more balanced and try to do just enough. I have no idea how you could measure this, though.

Reducing effort to save time

What happens when we rush? I think when we try to finish tasks quicker, we cut down on the extra effort – we’re more likely to accept requirements as-is than challenge them; we’re more likely to settle for the first design idea we think of; we’re more likely to leave the code poorly refactored. This pulls the effort range on the graph to the left.

To try and get done more quickly, I’ve reduced the amount of effort by “shooting low”: I cut corners and expend less effort than I would have done otherwise.

The trouble is this doesn’t make the best-case any better – I might still get the amount of effort bang on and spend the minimum amount of time possible on this task. However because I’m shooting low, there’s a danger now I spend far too little extra effort – the result is I spend even longer: I have to revisit requirements late in the day; make sweeping design changes to existing code; or waste time debugging or refactoring poorly written code.

This is a classic symptom of a team rushing through work: simple mistakes are made that, in hindsight, are obvious – but because we skipped discussing requirements or skipped discussing the design we never noticed them.

When I reduce the amount of extra effort I put in, rather than getting things done quicker, rushing may actually increase the time taken. This is the counter-intuitive world we live in – where aggressive deadlines may actually make things go more slowly. Perhaps instead I should have called this article:

If I had more time I would have been done quicker

Ability or methodology?

There’s been a lot of chatter recently on the intertubes about whether some developers are 10x more productive than others (e.g. here, here and here). I’m not going to argue whether this or that study is valid or not; I Am Not A Scientist and I don’t play one on TV, so I’m not going to get into that argument.

However, I do think these kinds of studies are exactly what we need more of. The biggest challenges in software development are people – individual ability and how we work together; not computer science or the technical. Software development has more in common with psychology and sociology than engineering or maths. We should be studying software development as a social science.

Recently I got to wondering: where are the studies that prove that, say, TDD works; or that pair programming works. Where are the studies that conclusively prove Scrum increases project success or customer satisfaction? Ok, there are some studies – especially around TDD and some around scrum (hyper-performing teams anyone?) – but a lazy google turns up very little. I would assume that if there were credible studies into these things they’d be widely known, because it would provide a great argument for introducing these practices. Of course, its possible that I’m an ignorant arse and these studies do exist… if so, I’m happy to be educated 🙂

But before I get too distracted, Steve’s post got me thinking: if the variation between individuals really can be 10x, no methodology is going to suddenly introduce an across the board 20x difference. This means that individual variation will always significantly dwarf the difference due to methodology.

Perhaps this is why there are so few studies that conclusively show productivity improvements? Controlling for individual variation is hard. By the time you have, it makes a mockery of any methodological improvement. If “hire better developers” will be 5x more effective than your new shiny methodology, why bother developing and proving it? Ok, except the consultants who have books to sell, conferences to speak at and are looking for a gullible customer to pay them to explain their methodology – I’m interested in the non-crooked ones, why would they bother?

Methodologies and practices in software development are like fashion. The cool kid down the hall is doing XP. He gets his friends hooked. Before you know it, all the kids are doing XP. Eventually, everyone is doing XP, even the old fogies who say they were doing XP before you were born. Then the kids are talking about Scrum or Software Craftsmanship. And before you know it, the fashion has changed. But really, nothing fundamentally changed – just window dressing. Bright developers will always figure out the best, fastest way to build software. They’ll use whatever fads make sense and ignore those that don’t (DDD, I’m looking at you).

The real challenge then is the people. If simply having the right people on the team is a better predictor of productivity than choice of methodology, then surely recruitment and retention should be our focus. Rather than worrying about scrum or XP; trying to enforce code reviews or pair programming. Perhaps instead we should ensure we’ve got the best people on the team, that we can keep them and that any new hires are of the same high calibre.

And yet… recruitment is a horrible process. Anyone that’s ever been involved in interviewing candidates will have horror stories about the morons they’ve had to interview or piles of inappropriate CVs to wade through. Candidates don’t get an easier time either: dealing with recruiters who don’t understand technology and trying to decide if you really want to spend 8 hours a day in a team you know little about. It almost universally becomes a soul destroying exercise.

But how many companies bring candidates in for half a day’s pairing? How else are candidate and employer supposed to figure out if they want to work together? Once you’ve solved the gnarly problem of getting great developers and great companies together – we’ll probably discover the sad truth of the industry: there aren’t enough great developers to go round.

So rather than worrying about this technology or that; about Scrum or XP. Perhaps we should study why some developers are 10x more productive than others. Are great developers born or made? If they’re made, why aren’t we making more of them? University is obviously poor preparation for commercial software development, so should there be more vocational education – a system of turning enthusiastic hackers into great developers? You could even call it apprenticeship.

That way there’d be enough great developers to go round and maybe we can finally start having a grown up conversation about methodologies instead of slavishly following fashion.

Software craftsmanship roundtable

Last night was the second London Software Craftsmanship round table. This was a great session with lots of lively discussion. We seemed to cover a wide variety of topics and I came away with loads of things I need to follow up on.

It’s amazing, as soon as you start discussing things with other developers you pick up on lots of different ideas, tools and technologies you’d not previously explored. These sessions are proving to be a great way to find out some of the cool stuff people are doing. So what did we discuss?

*phew* I think that covers it. If I missed anything, let me know below.

So, are you coming to the next one? What should we discuss?

 

Book Chain Relaunch

The book swapping site I run – www.BookChain.co.uk – has relaunched!

What Is It?

Book Chain is a free book swapping site. Send books you’ve read to others; get sent books you want in return.

How Does It Work?

Once you’ve signed up you can start adding books you’ve read to your library. This lets other users know the books you’re willing to share. As soon as someone asks for one of your books, we’ll let you know. Each book you send earns credit; the more credit you earn, the more books you can receive.

Once you’ve earned credit add books you want to your wanted list. Then others can send you books you want to read!

What’s New?

The biggest difference is moving to new servers (more on that later). This has made the whole site much faster; as well as helped expose some horrible performance bugs.

Book Chain also integrates more closely with Facebook now. You can now “like” books on Book Chain to let your Facebook friends know books you’re willing to share.

To celebrate the relaunch we’re running a competition. One lucky person that posts a book between now and the end of the year will win £50 in Amazon vouchers! So sign up today and start swapping books!

New Hardware

After a long time running on a shared server, over the last couple of weeks I’ve moved Book Chain into the cloud with Elastic Hosts. Amazingly, for roughly the same price I was paying for my own Tomcat instance on a shared server, I now get my own virtual Ubuntu box, with way more memory and CPU than I had before! And not only do I get more bang for buck, with Elastic Hosts I can scale my hardware up and down as I need to.

How Much Better Is It?

Well as you can see from the red line in the performance graph below, from the 7th of November (after a bit of a shaky start while I ironed out some issues) mean response times have plummeted from anywhere between 2 and 4 seconds down to consistently under 100ms. Some of this was me fixing some absolutely horrific bugs (more on that in a future post!) – but much of this improvement was simply from moving to better hardware.

Book Chain performance over the last month