Feeds:
Posts
Comments

Posts Tagged ‘refactoring’

Ever since the “Gang of Four” book, everyone and their uncle is an expert in patterns. Software is all about patterns – the trouble is, it seems very little of note has happened in the intervening 20 years.

Why patterns are good

Patterns help because they let us talk about code using a consistent language. Patterns make it easier to read code. Ultimately, reading code is all about being able to reason about it.

If I change this, what’s going to break?

This bug got reported in production, how the hell did it happen?

If I can see the patterns the code implements, it lets me reason about the code more easily, without having to carefully analyse exactly what it does. Without them, I have to carefully understand every single line of code and try and reverse engineer the patterns.

The trouble is, the patterns often get mixed up with the implementation. This makes it hard to discern what the pattern is, and whether it’s actually being followed. Just because a class is called TradeManagerVisitorFactory doesn’t mean I actually know what it does. And when you open up the rats nest, you realise it’s going to take a long time to know what on earth it’s doing.

Patterns Exist

Patterns exist in our code, whether we explicitly call them out or not – there are hundreds, probably thousands of patterns that we’re all familiar with. Wouldn’t it be great if these patterns, at least within a single code base, were consistent? Wouldn’t it be great if we could help keep developers on the straight and narrow so that when they instantiate a second instance of my “singleton” we can break the build because it’s obviously changed something that might break all sorts of assumptions.

If we could start identifying the patterns in our code, and (somehow) separate that from how they’re implemented, we might actually be able to change how certain things are implemented, consistently, mechanically, across an entire enterprise-scale code base. Let me guess, an example would help?

The Builder Pattern

Ok, it’s a really simple pattern – but it makes a nice example. I have a class, that I need to be able to create, so I use the builder pattern. However, really, a builder is a type of object construction pattern. If I start thinking about different ways objects can be instantiated I can quickly come up with a handful of alternatives:

  • A constructor, with a long parameter list
  • A builder, with a lot of setXxx methods
  • A builder, with a lot of withXxx methods, that each return the builder for method chaining
  • A builder with a list of properties for object initialisation (C#)

Now, this means I have four ways of implementing “the builder pattern”. And you know, it makes precisely zero difference, functionality-wise, which I choose. It’s an entirely stylistic choice. However, I probably want it to be consistent across my codebase.

When it boils down to it, my pattern has three elements:

  • The actual pattern – e.g. a builder with method chaining
  • The logic & configuration for the pattern – the list of fields, the types, any type conversions or parsing etc
  • The actual rendering of the first two – this is all we have today: source code

I don’t want to get rid of the last (code generation is great, until you have to, you know, actually use the generated code). This should all work from existing, real code. But maybe we could mark up the code with annotations or attributes to describe what pattern we’re following and how it’s configured. Some of the logic and configuration would have to be inferred – but that’s ok, once we know it’s a builder pattern we can take entire method bodies as “logic”.

Pattern Migration

But you know what would be really awesome? If I could change how the pattern is implemented. Maybe today I’m happy with all my objects being constructed from a long parameter list. But maybe tomorrow I decide to change my mind and use builders and withXxx method names. Wouldn’t it be awesome if I could simply change the global config and have all my code automagically refactored to match? If all my builders are annotated and identified, and all enforced to be written the same way – I can extract the configuration & logic from the implementation and re-render the pattern differently. I could programmatically replace a constructor call with a builder or vice versa.

Ok, the builder is a trivial example that probably isn’t of much use. But how many more patterns are there? Say today I have classes with lots of static methods. Well, really, they’re singletons in disguise. But none of the cool kids create singletons nowadays, they use IoC frameworks like spring to create singleton beans. Wouldn’t it be nice if I could mass-migrate a legacy codebase from static methods, to singletons, to spring beans (and back again when I realise that’s made things worse!)

Controllers

What about a more complex example – let’s compare spring-mvc and struts. Two frameworks to accomplish basically the same thing – but both with a very different philosophy. The answer isn’t to build a framework to capture the common parts of both – trust me, been there, done that: you get the worst of both worlds.

But, could you describe a pattern (probably several) that struts actions and spring-mvc controllers follow? Ultimately, both spring-mvc and struts let you respond to web requests with some code. Both give you access to the HTTP request and the session. Both give you a way of rendering a view. I wonder if you could describe how to extract the pattern-specific parts of a struts action and a spring-mvc controller? The config comes from different places, how to render views is different, how the view is selected is different, even the number of actions-per-class can be different. But, could you extract all that functionally irrelevant detail and separate the pattern configuration from the underlying source code.

If you could, maybe then you could describe a transformation between the two (sets of) patterns. This would give you a purely mechanical way of migrating a struts application to spring-mvc or vice versa.

Pattern Database

Now, the cost for me to describe, in great detail, all of the patterns for spring-mvc and struts in order to automatically migrate from one to the other is probably more than the effort to just get on and do it by hand. I could probably tackle a bunch with regexes and shell scripts, then hand finish the rest. However, what if we could define these patterns and share them? If there was some kind of open source pattern database? Then, the cost for the community is spread out – the cost for me is minimal.

Could we get to a point where our patterns are documented and codified? Where we can migrate between frameworks and even architectures with a manageable cost? An impossible dream? Or a better way of crafting software?

Read Full Post »

I always aspire to write well-crafted code. During my day job, where all production code is paired on, I think our quality is pretty high. But it’s amazing how easy you forgive yourself and slip into bad habits while coding alone. Is shame the driving force behind quality while pairing?

We have a number of archaic unit tests written using Easy Mock; all our more recent unit tests use JMock. This little piece of technical debt means that if you’re changing code where the only coverage is Easy Mock tests you first have to decide: do you fix up the tests or, can you hold your nose and live with / tweak the existing test for your purposes? This is not only distracting, but it means doing the right thing can be substantially slower.

Changing our Easy Mock tests to JMock is, in principle, a relatively simple task. Easy Mock declares mocks in a simple way:

private PricesService prices = createMock(PricesService.class);

These can easily be converted into JMock-style:

private Mockery context = new Mockery();
...
private final PricesService prices = context.mock(PricesService.class);

EasyMock has a slightly different way of declaring expectations:

prices.prefetchFor(asset);
expect(prices.for(asset)).andReturn(
    Lists.newListOf("1.45", "34.74"));

These need to be translated to JMock expectations:

context.checking(new Expectations() {{
    allowing(prices).prefetchFor(asset);
    allowing(prices).for(asset);
        will(returnValue(Lists.newListOf("1.45", "34.74")));
}});

This process is pretty mechanical, so as part of 10% time I started using my scripted refactoring tool – Rescripter – to mechanically translate our EasyMock tests into JMock. Rescripter let’s you run code that modifies your Java source. But this is more than just simple search & replace or regular expressions: by using Eclipse’s powerful syntax tree parsing, you have access to a fully parsed representation of your source file – meaning you can find references to methods, locate method calls, names, parameter lists etc. This is exactly what you need given the nature of the translation from one library to another.

This was inevitably fairly exploratory coding. I wasn’t really sure what would be possible and how complex the translation process would eventually become. But I started out with some simple examples, like those above. But, over time, the complexity grew as the many differences between the libraries made me work harder and harder to complete the translation.

After a couple of 10% days on this I’d managed to cobble together something awesome: I’d translated a couple of hundred unit tests; but, this was done by 700 lines of the most grotesque code you’ve ever had the misfortune to lay your eyes upon!

And then… and then last week, I got a pair partner for the day. He had to share this horror. Having spent 10 minutes explaining the problem to him and then 15 minutes explaining why it was throwaway, one-use code so didn’t have any unit tests. I was embarrassed.

We started trying to make some small changes; but without a test framework, it was difficult to be sure what we were doing would work. To make matters worse, we needed to change core functions used in numerous places. This made me nervous, because there was no test coverage – so we couldn’t be certain we wouldn’t break what was already there.

Frankly, this was an absolute nightmare. I’m so used to having test coverage and writing tests – the thought of writing code without unit tests brings me out in cold sweats. But, here I was, with a mess of untested code entirely of my own creation. Why? Because I’d forgiven myself for not “doing it right”. After all, it’s only throwaway code, isn’t it? It’s exploratory, more of a spike than production code. Anyway, once its done and the tests migrated this code will be useless – so why make it pretty? I’ll just carry on hacking away…

It’s amazing how reasonable it all sounds. Until you realise you’re being a total and utter fucktard. Even if it’s one-use code, even if it has a relatively short shelf-life 

the only way to go fast, is to go well

So I did what any reasonable human being would do. I spent my lunch hour fixing this state of affairs. The end result? I could now write unit tests in Jasmine to verify the refactoring I was writing.

Not only could I now properly test drive new code. I could write tests to cover my existing legacy code, so I could refactor it properly. Amazing. And all of a sudden, the pace of progress jumped. Instead of long debug cycles and trying to manually find and trigger test scenarios, I had an easy to run, repeatable, automated test suite that gave me confidence in what I was doing.

None of this is new to me: it’s what I do day-in day-out. And yet… and yet… somehow I’d forgiven myself while coding alone. The only conclusion I can draw is that we can’t be trusted to write code of any value alone. The shame of letting another human being see your sorry excuse for code is what drives up quality when pairing: if you’re not pair programming, the code you’re writing must be shameful.

Read Full Post »

Are you a Java developer? Do you use Eclipse? Ever find yourself making the same mindless change over and over again, wishing you could automate it? ME TOO. So I wrote an Eclipse plugin that let’s you write scripts that refactor your source code.

It does what now?

Some changes are easy to describe, but laborious to do. Perhaps some examples would help:

  • Renaming a method and renaming every call to that method (ok, Eclipse has built-in support for this)
  • Replacing a call to a constructor with a static factory method (ok, IntelliJ has built-in support for this, but Eclipse doesn’t)
  • Moving a method and updating callers to get a reference to the target class
  • Replacing use of one library with a similar one with a different API

Basically anything that involves the same, small change made multiple times across your source code.

How does it work?

After installing the Rescripter Eclipse plugin write some JavaScript to describe your change; when you run this, the script can make changes to your source code using Eclipse’s built-in refactoring and source code modification support.

Why Javascript? Because its a well understood language which, via Rhino, integrates excellently with Java.

An example would help about now

Ok, so imagine I have my own number class.

public class MyNumber {
	private int value;

	public MyNumber(String value) {
		this.value = Integer.parseInt(value);
	}

	public static MyNumber valueOf(String value) {
		return valueOf(value);
	}
}

My source code becomes littered with calls to the constructor:

MyNumber myNumber = new MyNumber("42");

As part of some refactoring, if I want to change how these numbers are created or implemented I may want to replace all calls to the constructor with calls to the static factory method. For example, this would let me introduce a class hierarchy that the factory method knows about, or any number of other changes that cannot be done if client code calls the constructor directly.

Unfortunately, Eclipse doesn’t provide a way to do this refactoring – however, Rescripter let’s you do it. We can describe what we want to do quite simply:

Find all references to the constructor, then for each reference:

  • Add a static import to MyNumber.valueOf
  • Replace the method call (new MyNumber) with valueOf
So what does this look like?
var matches = Search.forReferencesToMethod("com.example.MyNumber(String)");

var edit = new MultiSourceChange();
foreach(filter(matches, Search.onlySourceMatches),
    function(match) {
        edit.changeFile(match.getElement().getCompilationUnit())
            .addImport("static com.example.MyNumber.valueOf")
            .addEdit(Refactor.createReplaceMethodCallEdit(
                 match.getElement().getCompilationUnit(),
                 match.getOffset(), match.getLength(),
                 "valueOf"));
    });
edit.apply();

The first line finds references to the MyNumber(String) constructor.

Line 4 introduces a loop over each reference, filtering to only matches in source files (we can’t change .class files, only .java files).

Line 7 adds our static import.

Lines 8-11 replace the constructor call with the text “valueOf”

Our original call now looks like:

MyNumber myNumber = valueOf("42");

Now having replaced all uses of the constructor I can make it private and make whatever changes I need to the static factory method.

That’s great, but can it…?

Yes, probably. But maybe not yet. Rescripter is very basic at the minute, although you can still run some pretty powerful scripts making broad changes to your source code. If you’re having problems, have a feature suggestion or bug report – either post a comment or drop me a mail.

Read Full Post »

In a a blatant rip-off of the T.S Eliot quote: “if I had more time, I would have written a shorter letter” I had a thought the other day, perhaps:

If I had more time, I would have written less code

It seems to me the more time time I spend on a problem, the less code I usually end up with; I’ll refactor and usually find a more elegant design – resulting in less code and a better solution. This is at the heart of the programmer’s instinct that good code takes longer to write than crap. But as Uncle Bob tells us:

The only way to go fast is to go well

How then do we square this contradiction? The way to go fast is to go well; but if I sacrifice quality I can go faster – so I’m able to rush crap out quickly.

What makes us rush?

First off – why do developers try to rush through work? It’s the same old story that I’m sure we’ve all heard a million times:

  • There’s an immovable deadline: advertising has already been bought; or some external event – could be Y2K or a big sports event
  • We need to recognise the revenue for this change in this quarter so we make our numbers
  • Somebody, somewhere has promised this and doesn’t want to lose face by admitting they were wrong – so the dev team get to bend so those who over committed don’t look foolish

What happens when we rush?

The most common form of rushing is to skip refactoring. The diabolical manager looks at the Red, Green, Refactor cycle and can see a 33% improvement if those pesky developers will just stop their meddlesome refactoring.

Of course, it doesn’t take a diabolical manager – developers will sometimes skip refactoring in the name of getting something functional out the door quicker.

It’s ok though, we’ll come back and fix this in phase 2

How many times have we heard this shit? Why do we still believe it? Seriously, I think my epitaph should be “Now working on phase 2″.

Another common mistake is to dive straight in to writing code. When you’re growing your software guided by tests, this probably isn’t the end of the world as a good design should emerge. But sometimes 5 minutes thought before you dive in can save you hours of headache later.

And finally, the most egregious rush I’ve ever seen is attempting to start development from “high level requirements”. As the “inconsequential little details” are added in, the requirements look completely different. And all that work spent getting a “head start” is now completely wasted. You now either bin it and do-over, or waste more precious time re-working something you should never have written in the first place.

Does rushing work?

This is the heart of the contradiction – it feels like it does. That’s the reason we do it, isn’t it? When compelled by the customer to finish quicker, we rush – we cut corners to try and get things done quicker.

It feels like I’m going quicker when I cut corners. If only it wasn’t for all those unexpected delays that wasted my time then I would have been finished quicker. Its not my fault those delays cropped up – I couldn’t have predicted that.

It’s the product manager’s fault for not making the requirements clear

It’s the architect’s fault for not telling me about the new guidelines

Its just unlucky that a couple of bugs QA found were really hard to fix

When you cut corners, you pay for it later. Time not spent discussing the requirements in detail leads to misunderstandings; if you don’t spot it later on, you might have to wait until the show and tell for the customer to spot it – now you’ve wasted loads of time. Time not spent thinking about the design can lead to you going down an architectural blind alley that causes loads of rework. Finally time not spent refactoring leaves you with a pile of crap that will be harder to change when you start work on the next story, or even have to make changes for this one. Of course, you couldn’t have seen these problems coming – you did your best, these things crop up in software don’t they?

But what if you hadn’t rushed? What if rather than diving in, you’d spent another 15 minutes discussing the requirements in detail with the customer? You might have realised earlier that you had it all wrong and completely misunderstood what she wanted. What if you spent 30 minutes round a whiteboard with a colleague discussing the design? Maybe he would have pointed out the flaws in your ideas before you started coding. Finally, what if you’d spent a bit more time refactoring? That spaghetti mess you’re going to swear about next week and spend days trying to unravel will still be fresh in your mind and easier to untangle. For the sake of an hour’s work you could save yourself days.

How much is enough?

Of course, it’s all very well to say we will not write crap any more; but it’s not a binary distinction, is it? There’s a sliding scale from highly polished only-really-suitable-for-academics perfection; to sheer, unmitigated, what-were-you-thinking-you-brain-dead-fuck crapitude.

If spending more time discussing requirements could save time later, why not spend years going through a detailed requirements gathering exercise? If spending more time thinking before coding could save time later, why not design the whole thing down to the finest detail before any coding is done? Finally if refactoring mercilessly will save time, why not spend time refactoring everything to be as simple as possible? Actually, wait, this last one probably is a good idea.

Professional software development is always a balancing act. It’s a judgement call to know how much time is enough.

How long does it take?

When working through a development task I choose to expend a certain amount extra effort over and above how long it takes me to type in the code and test it; this extra time is spent discussing requirements, arguing about the design, refactoring the code etc etc. Ideally I want to expend just enough effort to get the job done as quickly as possible (the lazy and impatient developer). If I expend too little effort I risk being delayed later by unnecessary rework; if I expend too much effort I’ve wasted my time.

However, I don’t know in advance what that optimum amount of effort is – it’s a guess based on experience. But I expect the amount of effort I put in to be within some range of the optimum – the lowest the point on the graph:

All other things being equal, I’d expect the amount of time it actually takes to complete the task to be within some margin based on how close I got to the optimum amount of effort. Sometimes I do too little and waste time with rework. Other times I spend too long – e.g. too much detail in the design that didn’t add any value – so the extra time I spent is wasted.

This is by no means all the uncertainty in an estimate; but this is the difference between me not doing quite enough and having to pay for it later (refactoring, debugging or plain rework); versus doing just enough at every stage so that no effort is wasted and there’s no rework I could have avoided more cheaply. In reality the time taken is likely to be somewhere in the middle: not to great but not too shabby.

There’s an interesting exercise here to consider the impact experience has on this. When I was first starting out I’d jump straight into coding; my effort range would be to the left of the graph: I’d spend most of my time re-writing what I’d done so the actual time taken would be huge. Similarly, as I became more experienced I’d spend ages clarifying requirements, writing detailed designs and refactoring until the code was perfect; now my effort range would be to the right of the graph – I’d expend considerable upfront effort, much of which was unnecessary. Now, I’d like to think, I’m a bit more balanced and try to do just enough. I have no idea how you could measure this, though.

Reducing effort to save time

What happens when we rush? I think when we try to finish tasks quicker, we cut down on the extra effort – we’re more likely to accept requirements as-is than challenge them; we’re more likely to settle for the first design idea we think of; we’re more likely to leave the code poorly refactored. This pulls the effort range on the graph to the left.

To try and get done more quickly, I’ve reduced the amount of effort by “shooting low”: I cut corners and expend less effort than I would have done otherwise.

The trouble is this doesn’t make the best-case any better – I might still get the amount of effort bang on and spend the minimum amount of time possible on this task. However because I’m shooting low, there’s a danger now I spend far too little extra effort – the result is I spend even longer: I have to revisit requirements late in the day; make sweeping design changes to existing code; or waste time debugging or refactoring poorly written code.

This is a classic symptom of a team rushing through work: simple mistakes are made that, in hindsight, are obvious – but because we skipped discussing requirements or skipped discussing the design we never noticed them.

When I reduce the amount of extra effort I put in, rather than getting things done quicker, rushing may actually increase the time taken. This is the counter-intuitive world we live in – where aggressive deadlines may actually make things go more slowly. Perhaps instead I should have called this article:

If I had more time I would have been done quicker

Read Full Post »

Whether you like to think of it as technical debt or an unhedged call option we’re all surrounded by bad code, bad decisions and their lasting impact on our day to day lives. But what is the long term impact of these decisions? Are we really making prudent choices? Martin Fowler talks about the four classes of technical debt – from reckless and deliberate to inadvertent and prudent.

Deliberate reckless debt

Deliberate reckless technical debt is just that: developers (or their managers) allowing decisions to be made that offer no upside and only downside – e.g. abandoning TDD or not doing any design. Whichever way you look at it, this is just plain unprofessional. If the developers aren’t capable of making sensible choices, then management should have stepped in to bring in people that could. Michael Norton labels this “cruft not technical debt“. If we’re getting no benefit and simply giving ourselves an excuse to write crappy code then it’s not technical debt, it’s just crap.

Inadvertent debt

Inadvertent technical debt is tricky. If we didn’t know any better, how could we have done any differently? Perhaps industry standards or best practices have moved on. I’m sure once upon a time EJBs were seen as a good idea, now they look like pure technical debt. Today’s best practice so easily becomes tomorrow’s code smell.

Or perhaps we’d never worked in this domain before, if we had the domain knowledge when we started – maybe the design would have turned out different. Sometimes technical debt is inevitable and unavoidable.

Prudent deliberate debt

Then there’s prudent, deliberate debt. Where we make a conscious choice to add technical debt. This is a pretty common decision:

We need to recognise the revenue this quarter, no matter what

We’ve got marketing initiatives lined up so we’ve got to hit that date

We’ve committed to a schedule so don’t have time for rework

Sometimes, we have to make compromises: by doing a sub-standard job now, we get the benefit of finishing faster but we pay the price later.

Unlike the other types of technical debt, this is specifically a technical compromise. We’ve made a conscious decision to leave the code in a worse state than we should do. We know this will slow us down later; we know we need to come back and fix it in “phase 2″; but to hit the date, we accept compromise.

Is it always the right decision?

1. Compromise is always faster in the short-term and slower in the long-term

Given a choice between what we need now and some unspecified “debt” to deal with later – the obvious choice is always to accept compromise.

2. Each compromise is minor, but they compound

Unlike how most people experience “debt”, technical debt compounds. Each compromise we accept increases the cost of all the existing debt as well. This means the cost of each individual compromise may be small, but together they can have a massive impact.

First of all I decide not to refactor some code. Next time round, because its not well factored, it’s harder to test, so I skip some unit tests. Third time round, my test coverage is poor, so I don’t have the confidence to refactor extensively. Now it’s getting really hard to test and really hard to change. Little by little, but faster and faster, my code is deteriorating. Each compromise piles on the ones that went before and amplifies them. Each decision is minor, but they add up to a big headache.

3. It’s hard to quantify the long term cost

When we agree not to rework a section of code, when we agree to leave something half-finished, when we agree to rush something through with minimal tests: it’s very difficult to estimate the long term cost. Sure, we can estimate what it would take to do it right – we know what the principal is. But how can we estimate the interest payments? Especially when they compound.

How can I estimate the time wasted figuring out the complexity next time? The time lost because a bug is harder to track down. The extra time it takes because I can’t refactor the code so easily next time. The extra time it takes because I daren’t refactor the code next time; and the extra debt this forces me to take on. How can I possibly quantify this?

At IMVU they found they:

underestimated the long-term costs [of technical debt] by at least an order of magnitude

Is it always wrong?

There are obviously cases where it makes sense to add technical debt; or at least, to do something half-assed. If you want to get a new feature in front of customers to judge whether its valuable, it doesn’t need to be perfect, it just needs to be out there quickly. If the feature isn’t useful for users, you remove it. You’ve saved the cost of building it “perfectly” and quickly gained the knowledge you needed. This is clearly a cost-effective, lean way to manage development.

But what if the feature is successful? Then you need a plan for cleaning it up; for refactoring it; making sure it’s well tested and documented sufficiently. This is the debt. As long as you have a plan for removing it, whether the feature is useful or not – then it’s a perfectly valid approach. What isn’t an option is leaving the half-assed feature in the code base for years to come.

The difference is having a plan to remove the debt. How often do we accept compromise with a vague promise to “come back and fix later” or my personal favourite “we’ll fix that in phase 2″. My epitaph should be “now working on phase 2″.

Leaving debt in the code with no plan to remove it is like the guy who pays the interest on one credit card with another. You’re letting the debt mount up, not dealing with it. Without a plan to repay the debt, you will eventually go bankrupt.

The Risk of Technical Debt

Perhaps the biggest danger of technical debt is the risk that it represents. Technical debt makes our code more brittle, less easy to change. As @bertvanbrakel said when we discussed this recently:

Technical debt is a measure of code inflexibility

The harder it is to change, the more debt laden our code. With this inflexibility, comes the biggest risk of all: that we cannot change the code fast enough. What if the competitive or regulatory environment suddenly changes? If a new competitor launches that completely changes our industry, how long does it take us to catch up? If we have an inflexible code base, what will be left of our business in 2, 3 or 4 years when we finally catch up?

While this is clearly a worst case scenario, this lack of flexibility – this lack of innovation – hurts companies little by little. Once revolutionary companies become staid, unable to react and release only derivative products. As companies find themselves unable to innovate and keep up with the ever changing landscape – they risk becoming irrelevant. This is the true cost of technical debt.

Without a plan to repay the technical debt; with no way to reliably estimate the long term cost of not repaying it, are we really making prudent, deliberate choices? Isn’t the decision to add technical debt simply reckless?

Read Full Post »

%d bloggers like this: