Friction in Software

Friction can be a very powerful force when building software. The things that are made easier or harder can dramatically influence how we work. I’d like to discuss three areas where I’ve seen friction at work: dependency injection, code reviews and technology selection.

DI Frameworks

A few years ago a colleague and I discussed this and came to the conclusion that the reason most DI frameworks suck (I’m looking in particular at you, Spring) is that they make adding new dependencies so damned easy! There’s absolutely no friction. Maybe a little XML (shudder) or just a tiny little attribute. It’s so easy!

So when we started a new, greenfield project, we decided to put our theory to the test and introduced just a little bit of friction to dependency injection. I’ve written before about the basic scheme we adopted and the AOP endpoint it reached. But the end result was, I believe, very successful. After a couple of years of development we still had of the order of only 10-20 dependencies. The friction we’d introduced was light (add a couple of lines to a single class), but it was sufficient to act as a constant reminder not to just add a new dependency because it was easy.

Code Reviews

I was reminded of this recently when discussing code reviews. I have mixed feelings about code reviews: I’ve seen them work well, and it is better to have code reviews than not to have them; but it’s better still to pair program. But not all teams, not all developers, like pair programming – so code reviews exist. The trouble with code reviews is they can provide a form of friction.

If you & I are pairing on a piece of work, we will discuss the various trade-offs as we go: do we spend time on this, do we refactor that, etc etc. The constant judgements about what warrants attention and what can be left for another day are verbalised and agreed. In general I find the code written while pairing is high in quality but also remains tightly focused on task. The long rambling refactors I’ve been guilty of in the past disappear and the lazy “quick hacks” we all try and explain away to ourselves, aren’t so easy to gloss over when pairing.

But code reviews exist outside of this dynamic. In the cold light of the following day, someone uninvolved reviews your work and passes judgement on whether they think it’s up to scratch. It’s easy to see why this becomes combative: rather than being collaborative it can be seen as a judgement being passed, on not only the code but the author, too.

When reviewing code it is easy to set a very high bar, higher than you might set for yourself and higher than you might have agreed when pairing. Now, does this mean the comments aren’t valid? Absolutely not! You’re right, there is a test case missing here, although my change is unrelated, I should have added the missing test case. And you’re right this code is a mess; it was a mess before I was here and made a simple edit; but you’re right, I should have tidied it up. Everyone should practice code gardening.

These are all perfectly valid comments. But they create a form of friction. When I worked on a team that relied on these code reviews you knew you were going to get comments: so you kept the commit small, so as to minimize the diff. A small diff minimizes the amount of extra tests you could be asked to write. A small diff keeps most of the existing mess out of the review, so you won’t be asked to start refactoring.

Now, this seems dysfunctional: we’re deliberately trying to optimize a smooth passage through the review process, instead of optimizing for code quality. Worse than this though was what never happened: refactoring commits. Looking back I realise that the only code reviews I saw (as both reviewer and reviewee) were for feature changes. There were never any code reviews submitted for purely technical debt reduction. Sure, there’d be some individual commits in amongst the feature changes. But never any dedicated, multi-commit sessions, whose sole aim was to improve the code base. Which was a shame, because like any legacy code base, there was scope for improvement.

Comparing this to teams that don’t do code reviews, where I’ve tended to see more effort on reducing technical debt. Without fearing an endless cycle of review comments, developers are free to embark on refactoring efforts (that may or may not even work out!) – but at least they can try. Instead, code reviews provide a form of friction that might actually hurt code quality in the long run.

Technology Selection

I was talking to another colleague recently who is convinced that Hibernate is still the best way to get data in and out of a relational database. I can’t really work out how to persuade people they’re wrong – surely using Hibernate is enough to persuade you? Especially in a large, legacy code base – the pain that Hibernate causes is obvious. Yet plenty of people still believe in Hibernate. There are even people that still believe in Spring. Whether or not they still believe in the tooth fairy is unclear.

But I think technology selection is another area where friction is important. When contemplating moving away from something well-known and well used in industry like Spring or Hibernate there is a lot of friction. There are new technologies to learn, new approaches to understand and new risks to manage. This all adds friction, so sometimes it’s easiest just to stick with what we know. Sometimes it really is the right choice – the technology you have expertise in is the one you’ll be most productive in immediately. But there are longer term questions too, which are much harder to answer: will the team eventually be more productive using technology X than technology Y?

Friction in software is a powerful process: we’re very lazy creatures, constantly trying to optimise. Anything that slows us down or gets in our way quickly gets side-stepped or worked around. We can use this knowledge as a tool to guide developer behaviour; but sometimes we need to be aware of how friction can change behaviours for the worse as well.

Never trust a passing test

One of the lessons when practising TDD is to never trust a passing test. If you haven’t seen the test fail, are you sure it can fail?

traffic-lights-208253_1920Red Green Refactor

Getting used to the red-green-refactor cycle can be difficult. It’s very natural for a developer new to TDD to immediately jump into writing the production code. Even if you’ve written the test first, the natural instinct is to keep typing until the production code is finished, too. But running the test is vital: if you don’t see the test fail, how do you know the test is valid? If you only see it pass, is it passing because of your changes or for some other reason?

For example, maybe the test itself is not correct. A mistake in the test setup could mean we’re actually exercising a different branch, one that has already been implemented. In this case, the test would already pass without writing new code. Only by running the test and seeing it unexpectedly pass, can we know the test itself is wrong.

Or alternatively there could be an error in the assertions. Ever written assertTrue() instead of assertFalse() by mistake? These kind of logic errors in tests are very easy to make and the easiest way to defend against them is to ensure the test fails before you try and make it pass.

Failing for the Right Reason

It’s not enough to see a test fail. This is another common beginner mistake with TDD: run the test, see a red bar, jump into writing production code. But is the test failing for the right reason? Or is the test failing because there’s an error in the test setup? For example, a NullReferenceException may not be a valid failure – it may suggest that you need to enhance the test setup, maybe there’s a missing collaborator. However, if you currently have a function returning null and your intention with this increment is to not return null, then maybe a NullReferenceException is a perfectly valid failure.

This is why determining whether a test is failing for the right reason can be hard: it depends on the production code change you’re intending to make. This depends not only on knowledge of the code but also the experience of doing TDD to have an instinct for the type of change you’re intending to make with each cycle.

When Good Tests Go Bad

A tragically common occurrence is that we see the test fail, we write the production code, the test still fails. We’re pretty sure the production code is right. But we were pretty sure the test was right, too. Eventually we realise the test was wrong. What to do now? The obvious thing is to go fix the test. Woohoo! A green bar. Commit. Push.

But wait, did we just trust a passing test? After changing the test, we never actually saw the test fail. At this point, it’s vital to undo your production code changes and re-run the test. Either git stash them or comment them out. Make sure you run the modified test against the unmodified production code: that way you know the test can fail. If the test still passes, your test is still wrong.

TDD done well is a highly disciplined process. This can be hard for developers just learning it to appreciate. You’ll only internalise these steps once you’ve seen why they are vital (and not just read about it on the internets). And only by regularly practising TDD will this discipline become second nature.

Project vs product teams

One of the hardest things for companies trying to be agile is how to structure teams. Back in the bad-old days, teams would form around a project. Then six months later, everyone would dissipate and go onto new teams. By the time a team has formed and become effective it is ripped apart again. You get no sense of ownership, no continuity.

children-laptop

Nowadays everyone knows that projects are bad, you need scrum teams instead. So a scrum team is formed with a product owner to prioritise the work. But what often happens is that what gets prioritised onto the backlog is a project in bite-size pieces. For example, I saw one team that ran out of work to do. The backlog was empty because, except for bugs, none of the outstanding projects had been signed off. There’s that word again. Project.

Behind the scenes a scrum team often becomes a slightly better way of delivering projects. You get the benefits of team consistency and continuity and the added benefit that the business can carry on thinking of the work in terms of projects. The downside of this approach is the scrum team can lack clear focus: there’s no overarching goal for the team. From sprint to sprint the focus might change as the relative importance of different projects changes. This makes it hard for the team to feel committed to a big idea, to some greater purpose. It ends up an endless procession through the backlog.

Why does this happen? I think it comes down to money. Somebody, somewhere is watching the money. Somebody wants to know “if I spend £x here, how much am I going to make back and by when?” The idea of the project is very easy to fit into this model. The team costs £x per day. The project is estimated to take n days. It’s expected to deliver £y profit. From this we can calculate the expected return on our investment. The trouble is, most of these numbers are entirely made up. If not fundamentally unknowable.

Let’s start with the obvious one: how long is the project going to take? Really, we still actually ask this question? Have we learnt nothing from agile? It seems not: many, many people still think about the world in terms of delivery dates and certainties. When will we learn that the best way is always to deliver a little, inspect the results; then decide whether to keep on the same path or deliver something different. You can’t have an end date with this approach – it’s not even meaningful. Keep on delivering one thing until there’s something better you could be doing, then go do that. Rinse, repeat.

What about the other question: how much profit will this project make? Well, let’s assume for now that the entire project, as originally conceived, will actually be delivered (as if this ever actually happens in software). Can you tell how much money it’s made you? Really? Independent of every other change that the organisation has made at the same time? From software to operations to marketing?

Now sometimes you can come up with a good estimate of expected returns, but often it’s just a pipe dream. But, if you’re vigorously disagreeing with me: I assume you’re religiously tracking actual costs and feed that back into future project planning? I have seen very, very few companies actually do this. If you’re not actually measuring how much you made from a project, how do you know your original estimates were any good?

So we have two made up numbers, both almost certainly unachievable in practice – but we use this to dictate the team’s priority order. I once saw a project signed off and jump to the top of the priority order because it predicted something like a 10% uplift in revenue for the company. This was a very large number for a single project and clearly ridiculous to everyone involved, but it was signed off and duly implemented. Revenue projections later that year were re-estimated downwards and downwards due to difficult market conditions. And some blatant over-estimation. And yet, this non-science is what passes for return on investment planning in all-too-many organisations.

What’s the alternative? The best teams I’ve seen have been structured around products. Give the team complete ownership of one or more products. Any and all changes to those products go via the product team. A product owner guides product direction. As an area expert they are entrusted to decide what are the most important things to work on. They can discuss long term directions with the team and have a consistent, coherent vision for where the product will evolve towards. While, inevitably, some changes are large and sufficiently inter-dependent that they become a project (if one part is delivered then it all must be); the team understands the business benefit of the solution and can evolve the implementation to meet the underlying business need, instead of trying to satisfy some arbitrary internal project deadline. This gives teams the complete freedom to inspect and adapt each iteration. With an understanding of the business priorities for their products they can make sensible trade-offs as each iteration surfaces more information.

What about the money? It’s hard, but let’s be honest about it: return on investment is not clear with the project model of software delivery, so accept that it isn’t clear. The hard thing is working out which products are making you money and which could make more money if more was invested. The trouble is I’ve worked in teams where, honestly, the product was so profitable with so little scope for uplift that the most cost-effective thing to do would have been to fire the dev team and just keep milking the cash cow.

So how can we decide where to spend our money? I think the empirical model of agile could fit here perfectly well. Let’s assume for a minute that the amount of money you have for the delivery team as a whole is fixed – your only choice is where to put it. How much to spend on product A vs how much on product B. Can you estimate how much money each product is making for the business? How is it changing over time?

If one product is making more profit each month – if it’s a growing product – then invest more resources there, to accelerate the growth. If a product is slowing down, with smaller increases in profit each month, or even with profit decreasing – then stop spending so much money on it. This naturally means that your money goes where it seems to be delivering the biggest return. Put your money where it seems to be delivering results.

The hardest thing with this is that it takes time to get the feedback: changing resource allocation could take months to show up on the bottom line. But at least we’re being honest about the impact our decisions have. Instead of trying to micro-manage delivery via projects, manage where resources are put and let the product owner manage the priority order.

Cross-functional teams

Cross-functional teams aren’t a new idea. And yet, somehow, we still don’t seem to have got the memo.

I was listening to the excellent Scott Hanselman’s podcast “Hanselminutes” last week, he had Angie Jones on to talk about automation. Among all the great advice around ensuring that automation is a first-class citizen in your development process one thing stood out for me

you need to get your automation engineers into your scrum team

I don’t know why, but it surprised me. Are there really companies out there up to speed enough to be doing test automation; yet so behind the times with agile that they think it’s a good idea to have a dedicated team of automation engineers, removed from the rest of the dev team?

Cross-functional teams are a pretty central idea to agile – breaking down silos and ensuring that everyone that is required to produce an increment of working software is aligned and working together. It’s certainly not a new idea, but it’s clearly an idea that we’re still struggling to absorb.

But then, looking back, I remember working for a certain large company that decided they needed to “do more test automation”. So they hired a room full of automation engineers, who sat two floors away from the dev team, in a room hidden away (we honestly didn’t know they were even there for weeks, maybe even months). This team were responsible for creating an automated test pack for the application (rather than use the one the test automation engineers in the scrum teams had been working on for the last few years). But… they weren’t even talking to the scrum teams. So they were constantly chasing to keep up. As you can imagine, hilarity ensued. I say hilarity. Arguments, really. Then anger. Eventually laughter as we realised all this effort was wasted because the scrum teams wouldn’t – in fact couldn’t – support this new automation code.

Clearly not getting the idea of cross-functional teams is an age old problem. Compare this to a more recent client of mine – one that had a genuinely more cross-functional team. Not only did the scrum team have its own automation engineer, the developers (actual developers, not re-branded testers) were encouraged to work on the test automation tools – to everyone’s benefit. Test tooling written to the same standards as production code, with the insight and experience of the test automation specialist. This is moving beyond cross-functional teams into cross-skilled teams. Not only is every skill set you need within the same scrum team, but each individual can have multiple skills, taking on multiple roles.

Sure, you still need specialists. But with generalizing specialists you get the best of both worlds: the experience of specialists in their area, with the flexibility and breadth of ideas that come from the whole team being able to work on whatever is required. When the entire team can swarm on any area you have a very flexible team, if we need a big push for test automation the entire team can focus on it. Similarly with plenty of pairing and rotation everyone on the team will see every area and every role, allowing everyone’s unique perspective to improve the product and the process.

But then, a counter-example, the same client suffered from another age-old silo: operations. I thought devops had killed this silo, but it seems not. If the scrum team can’t release an increment of software to actual users then it isn’t a fully self-contained, self-sufficient, cross-functional team. A scrum team working with a separate test automation team seems like a crazy idea; and yet, somehow, a scrum team working with a separate operations team is much more normal, much more accepted. But it’s the exact same problem: if you don’t have everyone you need in the same scrum team then you’re going to get bottlenecks. You’re going to get communication problems. You’re going to get a “them-vs-us” attitude.

Every time I’ve come up against this the typical argument against operations staff being embedded within scrum teams is that they’re not working on “your stuff” all the time, so the rest of the time they’d be busy doing other stuff, unrelated to “your team” or they’d be bored. Well, maybe if we freed up that extra capacity we could release more often? Maybe they could be working on making it quicker, easier, safer to release more often? Maybe they could be more deeply involved with development when we’re making decisions which affect what they’re going to release and how it’s going to wake them up in the middle of the night. Maybe they could even help with other, non-production environments? The neglected, little siblings of production that every company seems to struggle to pay enough attention to.

Maybe, even, over time the team evolves from having the operations specialist to having team members cross skilled into operations. Under the watchful eye of the specialist could we, shock horror, let testers touch production? Could the BA manage a release? In some industries this is completely impossible for regulatory reasons, but in all the others its “impossible” for merely arbitrary reasons.

Breaking down silos is never easy – but I think it’s an interesting reflection of how far we’ve come that some silos seem frankly ridiculous now, while others just seem old-fashioned. I still hold out for the distant dream of genuinely cross-functional teams. Whenever I’ve seen this actually happen the lack of bottlenecks and mis-communication makes everything so much faster, so much easier. In the end a cross-functional team is better than silos. But a cross skilled team is better still, if you can manage it.

Effectiveness of Teams

Agile places an emphasis on the importance of the team. The team make the decisions: what do we work on today, how do we tackle our constraints, even who should be in the group. But yet some research seems to suggest that individuals are more effective than teams.

For example in “59 seconds” Richard Wiseman questions the effectiveness of brainstorming – groups tend to focus on mundane, easily agreed upon suggestions; or be swayed by uncreative, charismatic team members.

How do we reconcile this conflict? If groups tend to lack creativity and flexibility in their thinking, why do agile teams appear to be more creative, more flexible and above all more effective? Is it an illusion, or does agile actually help teams achieve more?

Just one developer?

The trouble with software is its rarely a lone sport anymore. There aren’t many fields where one developer on his own can make a significant contribution. But where one developer can make meaningful progress you will get the best bang for buck. As soon as you add a second team member you need much more communication (ok, I might sit and talk to myself sometimes, but I talk much more when there’s another human being there). By the time you’re adding a third, fourth or fifth developer, you’re spending loads of time just talking and drawing on whiteboards and standing around having meetings.

If I’m the only person to have touched the code, when it crashes – I know exactly whose fault it is. As soon as there are more developers, we get to play blamestorming. “Well it works fine on my PC”, “It worked last time I ran it”, “You checked in last – it must be your bug”. You start to get the diffusion of responsibility that Wiseman talks about. People don’t feel personally responsible for the output, so they don’t feel compelled to make it better: half assed is good enough, it’s not my problem.

The truth is most activities of any size nowadays require a team of people to work on, which immediately raises the question of who works on what, when.

Fluidity

Maybe agile helps teams be more effective by letting the team be more fluid. Rather than the smartest people getting stuck on one problem or in one area, the fluidity and constant reassessment of agile allows the smart people to automatically refocus to where they need to be. But critically, it doesn’t need someone to micromanage the situation and tell them to work on the most important things – people will “self-organise” and naturally gravitate to where they can help most.

At the daily standup Harry says:

Jim – are you doing ok with the checkout flow? You’ve never done anything like that before so would it help if I came and paired with you today? The order history page can wait until next week so we can hit our target for Friday.

Magic: a “self-organising team”. Imagine some asshat manager had said that! Jim would feel like an idiot, Harry gets to feel awkward so tries not to ride roughshod over Jim’s work – both get dragged down and demotivated; the end result is slow, sloppy work and a miserable team. Instead, because the team came up with the idea, everyone’s happy about it and the work gets done as quickly as possible.

Because different people are always offering help – either because they’re nosy and want to know how something works, or because they’re some smartass know-it-all that’s good at everything – the fact that the smartest people are quickly rotating round the group’s biggest problems isn’t always plain to see. Everyone is moving around; but most of the movement is noise: the important thing is that the brightest, most capable people are moving to where they are needed most.

Maybe the fluidity simply creates a socially acceptable way for the smart people on the team to leap from problem to problem without the rest of the team feeling stupid.

Who’s the rockstar?

I hate the term, but if agile teams are more effective because the “rockstar” developers are working on all the important stuff – that suggests everyone else is working on the unimportant stuff. Now, if your company has time to pay idiots to work on stuff that nobody wants – maybe I can offer you some overpriced consultancy?

But that doesn’t happen, does it? Perhaps because the “rockstar” on the team, is probably only good at playing guitar (stretching the tortured analogy). I’ve heard him play a drum solo: it’s shit. But the drummer? Yeah, he’s not too bad at that. Everyone on a team has different strengths, and will do best at certain tasks. As a manager, it’s almost impossible to try and assign people to tasks to get the best out of everyone and deliver the most value possible. You’re basically trying to allocate resources centrally, which turns out to be pretty hard.

Instead by delegating resource allocation to the team, the team decide who would be best on each activity; the team take responsibility for delivering as much value as quickly as possible. Even if that sometimes means people are working on tasks they’re not suited for – those who are better at it might be working on something more valuable. Sometimes you need a drummer, even if they’re not the best drummer in the band.

Costs

Regular task switching and lots of pairing is great for creating an environment where developers can move from task to task easily. But this comes with a cost – I can’t immediately pick up where James left off, I need to talk to him to find out what he was doing and where he got to, I need to learn and understand the code he wrote yesterday before I can write more. This has a cost to it.

What about the diffusion of responsibility? If six different people all work on the same feature, won’t we find that nobody really cares whether it works, because everyone blames the other guys? Well, assuming we’re all professional developers, I’m sure we wouldn’t sink to such childish behaviour. But it’s surprising how easy you can become detached from the goal you’re aiming for – the overall benefit you’re trying to deliver for the customer. You know what’s left to do, so you do your little bit. You don’t think about the overall goal and what the customer actually wanted. You take your eye off of quality for a split second and bang! You screwed up.

I suspect diffusion of responsibility is a genuine problem in agile teams – which is why shared ownership is emphasised. We all own this code – so treat it as your own. In another light, it’s the craftsmanship ethic – to leave the code a little better than you found it. Don’t just assume the other guy knew what he was doing: fix it, properly. Without this, the diffusion of responsibility would lead to chaos.

To tolerate all these costs: task switching, diffused responsibility, communication and coordination overhead – there simply must be a massive benefit. The upside of having the right people on the right task at the right time must outweigh all those downsides.

But does it always? Does it on your team?