Software development is rife with copy & paste: all of us resort to copy and paste coding sometimes. We know we probably shouldn’t, but we do
it anyway. It’s like the industry’s dirty little secret: we mainly just copy and paste code from the internet or from somewhere else in the code base then bash it till it works.
But maybe, just maybe, the fact that we all rely on this from time to time should be telling us something?
Sometimes copy & paste coding can be a good thing. A while ago I was pairing with someone where we did what I would call “search and replace coding”. I love to code golf. In tools like Eclipse, Intellij or Resharper there is always an optimal way to make each change, letting the tools do as much of the typing as possible. So it was with fascination recently that a colleague showed me an interesting code golf using search & replace.
The change was around extending an existing class with a load of new fields. We had a basic class, with a couple of sample fields, and a test that verified something simple about the class – say that it could be serialized to JSON successfully. We needed to add a boatload of new fields to the class (don’t ask). This involved five separate tasks which, at a macro level, had a lot in common:
- Adding each field
- Initialising each field in the constructor
- Modifying the test to setup sample values for each field
- Modifying the test to pass the sample value for each field to the constructor
- Writing an assert that each field had successfully serialized and deserialized
My colleague demonstrated that we could write the list of fields once. Then, copy & paste, search & replace – we have a list of parameters for the constructor. By carefully crafting the search term and a suitable replacement term you can do some limited meta-programming. Taking the list of fields and transforming it into the actual lines of code you need in each instance. The list of fields replaced one way gives you a list of parameters to the constructor; with a different replacement you get the field declarations in the test; with another replacement you get assert statements.
I found this a very interesting way to write software. It definitely optimized the number of key presses required to type the code in. The long, boring list of field names only had to be typed in once; after that merely carefully crafting a search & replace regex to do the lifting for us. But it demonstrates an underlying truth: we had five separate changes to make, which accepted a parameter. If we could have actually meta-programmed this, we would have passed in the list of fields to some meta-programming code which would output the desired lines of code.
While it’s very clever that we could use search & replace meta-programming for this, it feels like the tools are lacking somehow.
Everyone copies code from stackoverflow from time to time. Hopefully you do it sensibly, using it as a starting point for your own production code. Working through what you’ve just pasted in to understand it, experimenting with it, modifying it, making it fit for your purpose. Rather than just blindly copy & pasting random code from the internet. I mean, who’d just randomly run code from the internet?
Stackoverflow is great. It’s an amazing resource for programmers. While learning WPF I got the sense that as a technology it couldn’t have taken off without something like stackoverflow. The technology is so opaque, so hard to learn. It took me many, many months of copy & pasting xaml from stackoverflow until I started to really understand how it worked. This is a technology that is not trivial to understand. Without being able to just copy & paste code from the internet, the technology would take a lot longer to learn.
But often, we’re lazy, we try it. It works. Woohoo, next problem. I’ve written before about voodoo programming, but the problem is that it’s easy to think you’ve understood what code is doing. If you didn’t actually have to invent those lines, to reason through to them – maybe you don’t understand. Maybe you only think you know what the code’s doing? Maybe there’s some horrific bug you’re not aware of yet?
Almost every time I’ve seen TDD done at any scale, when it comes to writing a new test, the first question to ask is: “which existing test is this most like?” Yup, where can I go copy & paste. I’m so lazy, I don’t want to have to invent a whole test setup on my own. I’ll just borrow somebody else’s homework. We’ve all done it. It seems to be an unwritten rule of TDD. By the time you’re on to the third or fourth test in a given file, I guarantee the temptation to just copy & paste to make the fifth test is incredibly strong.
The trouble with this is all sorts of weird test artefacts get copied forwards. You start off with a simple test, with some simple setup. Say an empty bank account. Then to test non-zero balance you need an account with a transaction. Then to test balance summing you need an account with two transactions. Each test so far is building on the last, so you’ve just copy & pasted the previous one as a starting point. The fourth test is inserting a new transaction, so you just copy the third test (with two existing transactions, unnecessary for this test). Test five is that a withdrawal can’t go below zero, so you copy & paste the previous test and set the amount to be a large negative value. Test six is an overdraft test, so you copy & paste the previous test but change the account setup to include an overdraft so the balance can go negative. By the time you copy & paste test seven, your starting point is an account with an overdraft, with two transactions where a third transaction is added. Test seven is about adding a standing order. None of this noise is necessary.
This might seem a ridiculous example, but I see this time and again in real code. Sometimes it’s not even me that’s done it. This noise accumulates as you read down a test file. The tests at the bottom have all sorts of weird artefacts that were only relevant to one test half way up the file. It means fixing tests and changing behaviours becomes a real problem. If I change, say, how overdrafts are defined in my example above – I might have half a dozen tests to change, only one of which even mentions overdrafts. But they’ve inherited that setup because the tests were just copy & pasted around. Not only does this make tests hard to read and understand it makes them hard to maintain.
We all do it. It seems a pretty accepted part of how TDD happens in the wild. And yet, it clearly isn’t right. With discipline, we can keep our tests clean. Yes, when we’re all being conscientious developers we start writing our tests from scratch each time. But most of the time we’re busy or lazy or whatever.
What do these three things have in common, besides copy & paste? In each example we’re using copy & paste to save time. To find the most efficient path through the work we’re doing. Nobody is doing it out of malice or stupidity. Laziness? Almost certainly – but the good kind. The kind of laziness that encourages elegant solutions.
But copy & paste isn’t an elegant solution. It’s a crappy solution to a more fundamental problem: our tools are deficient. Really what we’re working around is the fact that our tools don’t let us express what we really want.
What if we could write our tests in a higher level language? “A test with a bank account with two transactions”. Sure, there are internal & external DSLs you can use to do this. But typically the cost in setting up the DSL isn’t worth the hassle for unit tests. It would completely ruin the flow of TDD. Does that just mean we haven’t found a good way of doing it yet? Is there a way we could more fluidly express the intent of the test, filling in the gaps as we go?
Instead of copy & pasting code from the internet, could our tools get smarter? Could we take some of the amazing machine learning stuff that’s going on and apply it to software development? I tried playing with an IntelliJ plugin recently that promises just that. Unfortunately it’s pretty buggy at the minute and doesn’t really work. But the idea is incredibly attractive. I like the idea of being able to express intent instead of mindlessly typing in the nuts & bolts.
Finally, instead of doing search & replace coding, wouldn’t it be great if we could actually meta-program? If we could actually write code that would write code for us? Not just code generators, but something that can generate small sections of code. I had a very limited go at this some time ago with rescripter – but it turns out its very hard to write a decent meta-programming tool, that anyone except the author can understand. But I think the idea still has merit: too often I find cases where I can describe the intent of my change very succinctly, but implementing it will involve far more typing than I’d like.