Feeds:
Posts
Comments

Archive for the ‘Software development’ Category

Whether or not you write integration tests can be a religious argument: either you believe in them or you don’t. What we even mean by integration tests can lead to an endless semantic argument.

What do you mean?

Unit tests are easy to define they test a single unit: a single class, a single method, make a single assertion on the behaviour of that method. You probably need mocks (again, depending on your religious views on mocking).

Integration tests, as fas as I’m concerned, mean they test a deployed (or at least deployable) version of your code, outside in, as close to what your “user” will do as possible. If you’re building a website, use Selenium WebDriver. If you’re writing a web service, write a test client and make requests to a running instance of your service. Get as far outside your code as you reasonably can to mimic what your user will do, and do that. Test that your code, when integrated, actually works.

In between these two extremes exist varying degrees of mess, which some people call integration testing. E.g. testing a web service by instantiating your request handler class and passing a request to it programmatically, letting it run through to the database. This is definitely not unit testing, as it’s hitting the database. But, it’s not a complete integration test, as it misses a layer: what if HTTP requests to your service never get routed to your handler, how would you know?

What’s the problem then?

Integration tests are slow. By definition, you’re interacting with a running application which you have to spin up, setup, interact with, tear down and clean up afterwards. You’re never going to get the speed you do with unit tests. I’ve just started playing with NCrunch, a background test runner for Visual Studio – which is great, but you can’t get it running your slow, expensive integration tests all the time. If your unit tests take 30 seconds to run, I’ll bet you run them before every checkin. If your integration tests take 20 minutes to run, I bet you don’t run them.

You can end up duplicating lower level tests. If you’re following a typical two level approach of writing a failing integration test, then writing unit tests that fail then pass until eventually your integration test passes – there is an inevitable overlap between the integration test and what the unit tests cover. This is expected and by design, but can seem like repetition. When your functionality changes, you’ll have at least two tests to change.

They aren’t always easy to write. If you have a specific case to test, you’ll need to setup the environment exactly right. If your application interacts with other services / systems you’ll have to stub them so you can provide canned data. This may be non-trivial. The biggest cost, in most environments I’ve worked in, with setting up good integration tests is doing all the necessary evil of setting up test infrastructure: faking out web services, third parties, messaging systems, databases blah blah. It all takes time and maintenance and slows down your development process.

Finally integration tests can end up covering uninteresting parts of the application repeatedly, meaning some changes are spectacularly expensive in terms of updating the tests. For example, if your application has a central menu system and you change it, how many test cases need to change? If your website has a login form and you massively change the process, how many test cases require a logged in user?

Using patterns like the page object pattern you can code your tests to minimize this, but it’s not always easy to avoid this class of failure entirely. I’ve worked in too many companies where, even with the best of intentions, the integration tests end up locking in a certain way of working that you either stick with or declare bankruptcy and just delete the failing tests.

What are the advantages then?

Integration tests give you confidence that your application actually works from your user’s perspective. I’d never recommend covering every possible edge case with integration tests – but a happy-path test for a piece of functionality and a failure-case gives you good confidence that the most basic aspects of any given feature work. The complex edge cases you can unit test, but an overall integration test helps you ensure that the feature is basically integrated and you haven’t missed something obvious that unit tests wouldn’t cover.

Your integration tests can be pretty close to acceptance tests. If you’re using a BDD type approach, you should end up with quite readable test definitions that sufficiently technical users could understand. This helps you validate that the basic functionality is as the user expects, not just that it works to what you expected.

What goes wrong?

The trouble is if integration tests are hard to write you won’t write them. You’ll find another piece of test infrastructure you need to invest in, decide it isn’t worth it this time and skip it. If your approach relies on integration tests to get decent coverage of parts of your application – especially true for the UI layer – then skipping them means you can end up with a lot less coverage than you’d like.

Some time ago I was working on a WPF desktop application – I wanted to write integration tests for it. The different libraries for testing WPF applications are basically all crap. Each one of them failed to meet my needs in some annoying, critical way. What I wanted was WebDriver for WPF. So I started writing one. The trouble is, the vagaries of the Windows UI eventing system mean this is hard. After a lot of time spent investing in test infrastructure instead of writing integration tests, I still had a barely usable testing framework that left all sorts of common cases untestable.

Because I couldn’t write integration tests and unit testing WPF UI code can be hard, I’d only unit test the most core internal functionality – this left vast sections of the WPF UI layer untested. Eventually, it became clear this wasn’t acceptable and we returned to the old-school approach of writing unit tests (and unit tests alone) to get as close to 100% coverage as is practical when some of your source code is written in XML.

This brings us back full circle: we have good unit test coverage for a feature, but no integration tests to verify that all the different units are hanging together correctly and work in a deployed application. But, where the trade-off is little test coverage or decent test coverage with systematic blindspots what’s the best alternative?

Conclusion

Should you write integration tests? If you can, easily: yes! If you’re writing a web service, it’s much easier to write integration tests for than almost every other type of application. If you’re writing a relatively traditional, not too-javascript-heavy website, WebDriver is awesome (and the only practical way to get some decent cross-browser confidence). If you’re writing very complex UI code (WPF or JavaScript) it might be very hard to write decent integration tests.

This is where your test approach blurs with architecture: as much as possible, your architecture needs to make testing easy. Subtle changes to how you structure your application might make it easier to get decent test coverage: you can design the application to make it easy to test different elements in isolation (e.g. separate UI layer from a business logic service); you don’t get quite fully integrated tests, but you minimize the opportunity for bugs to slip through the cracks.

Whether or not you write integration tests is fundamentally a question of what tests your architectural choices require you to write to get confidence in your code.

Read Full Post »

A colleague asked me recently:

Why aren’t developers writing comments any more?

He’d been looking through some code his team had written, and couldn’t understand it – he was looking for comments to make sense of the mess, but there were none. Before he challenged the team, he asked my opinion: should developers be writing comments?

Excessive Comments

When I started programming some years ago, I would comment everything, and I mean everything.

    // Add four to x
    x += 4;

My logic at the time was that it was impossible to tell the difference between uncommented clear code and gnarly code you just hadn't spotted the gnarliness in yet. So I would comment everything, where the absence of a comment meant I'd forgotten - not that it was so trivial as to not warrant mentioning.

No Comments

Eventually I started working with peers who knocked some sense into me, and I immediately halved the number of lines of code I produced. At first it was a shock, but soon I realised that clear code is easier to read if there's just less noise. And this is all (most) comments become: noise. Sometimes they're accurate, sometimes they're not. But you can't rely on them, you always have to assume they might be wrong. So if the code and the comment seem at odds, you assume it's the comment that's wrong and not your understanding of the code (naturally you're infallible!)

Clean Code

Uncle Bob and the notion of clean code have taken "no comments" to an almost fanatical zeal. But every time I get into an argument with someone about how maybe this time a comment might be justified: Ctrl-Alt-M, enter your comment as the method name and it makes the code more obvious. Every. Damned. Time.

However, the trouble with a zealous argument like this is it gets taken up by asshats and lazy people. It's too easy to say "if you'd read Clean Coder you'd know you don't need comments. Quit living in the past, grandpa!". Uh huh. But your code is still a muddled pile of indecipherable crap. At least with comments there'd be some signposts while I wade through your steaming mess.

Some Comments

The truth is: sometimes comments do help (squeal clean code weenies, squeal!) There are some cases where extracting a method name isn't sufficient. Sometimes having 20 one line methods in my class does not make it easier to read. The end goal is to produce understandable code. Generally, naming things properly helps. Adding comments that get stale does not. But that doesn't mean that writing crap code and not commenting is the answer. Don't use "no comments" as an excuse to leave your code indecipherable by human beings.

For example, sometimes you need to document why you didn't do something. Maybe there's a standard way of converting between currencies in your application - which this one time you've deliberately not used. A comment here might help future people not refactor away your deliberate choice (even better is baking your decision into a design - some class structure that makes it really obvious). Sometimes a method name really doesn't do a line of code justice, it's better to be seen in the context of the lines before and after it - but it really needs some explanation of what you're doing. This is particularly true when dealing with complex algorithms or mathematical formulae.

Getting the Balance Right

How do you get the balance right? Well, your goal is to make code that other people can understand. The best way to get feedback on that is to ask someone. Either have explicit code reviews or pair program. If another human being has read your code and they could understand it - there's a better than average chance that most other capable people will be able to read it too.

Read Full Post »

Newcomers to TDD ask some interesting questions, here’s one I was asked recently: testing private methods is bad, but why?

How did we get here?

If you’re trying to test private methods, you’re doing something wrong. You can’t get to TDD nirvana from here, you’re gonna have to go back.

It all started with an innocuous little class with an innocuous little method. It did one little job, had a nice little unit test to verify it did its thing correctly. All was right with the world. Then, I had to add an extra little piece of logic. I wrote a test for it, changed the class until the test passed. Happy place. Then I started refactoring. I realised my little method, with its handful of test cases was getting quite complicated, so I used the extract method refactoring and boom! I have a private method.

While simple when I extracted it, another couple of corner cases and this private method evolves into quite a complicated piece of code – which now I’m testing one level removed: I’m testing the overall functionality of my outer method, which indirectly tests the behaviour of the private method. At some point I’m going to hit a corner case that’s quite awkward to test from the outside, it’d be so much easier if I could just test the private method directly.

What not to do

Don’t use a test framework that let’s you test private methods. Good God, no. For the love of all that is right with the world step away from the keyboard.

What to do instead

This is a prime example of your tests speaking to you. They’re basically shouting at you. But what are they saying?

Your design stinks!

If you need to test a private method – what you’re doing wrong is design. Almost certainly, what you’ve identified in your private method is a whole new responsibility. If you think about it carefully, it probably isn’t anything to do with what your original class is. Although your original class might need renaming to make that obvious. That’s ok, too. That’s incremental design at work.

An example would help about now

Say I started off with a Portfolio class – it has a bunch of Assets in it, each of which has a Value. So I can implement a Portfolio.GetValue() to tell me how much it’s all worth. But then I start dealing with weird corner cases like opening or closing prices. And what do I mean by value, what I could sell it for, right now? Or perhaps there’s some foreign currency conversion to do, or penalty clauses for early exit, how does all that get factored in?

Before too long, GetValue() has a fairly large piece of logic, which I extract into GetSpotSalePrice(Asset). This method is then hairy enough to need testing, so it’s pretty clear that my design stinks. The deafening din of my tests, finally persuades me to extract GetSpotSalePrice(Asset) into another class, but here’s the million dollar question: which?

What not to do – part 2

For the love of SOLID, don’t put it in a AssetSalePriceCalculator, or a SalePriceManager. This is the number one easy mistake to make – you can follow TDD and ignore decent OO design and still end up with a steaming turd pile of horribleness.

NounVerber class is always a design smell. Just stop doing it. Now. I mean it. I’m watching you. I will hunt you down and feed you to the ogre of AbstractSingletonProxyFactoryBean.

What should I do then?

The answer might seem obvious, but to too many people starting out doing TDD and half-decent design – it isn’t at all obvious. The method needs to move to a class where that responsibility makes sense. In our example, it’s crushingly obvious this is really a method on Asset – it even passes one in. If your method has one class parameter and uses a bunch of data from that class, you can bet your bottom dollar you’re suffering feature envy. Sort your life out, apply the method move refactoring. Go live happily ever after.

Summary

Why shouldn’t you test private methods? Because the fact you’re asking the question means the method shouldn’t be private – it’s a piece of independent behaviour that warrants testing. The hard choice, the design decision, is where you stick it.

 

 

Read Full Post »

What makes you a “senior developer”? Everyone and their dog calls themselves a senior developer these days. From fresh graduates to the CTO, everyone is a senior developer. But what the hell does it even mean?

Technologists

Some developers are avid technologists. They got into programming really because they like tinkering. If it hadn’t been 7 languages in 7 weeks, it would have been a box of meccano or they’d be in their shed busy inventing the battery operated self-tieing tie. These people are great to have on your team, they’ll be the ones continually bombarding you with the latest and greatest shiny. If you ever want to know if there’s an off the shelf solution to your problem, they’ll know the options, have tried two of them, and currently have a modified version of a third running on their raspberry pi.

The trouble with technologists is more technology is always the answer. Why have a HTTP listener when you can have a full stack application server? Why use plain old TCP when you can introduce an asynchronous messaging backbone? Why bother trying to deliver software when there’s all these toys to play with!

Toolsmiths

Some developers naturally gravitate towards providing tools for the rest of the developers on the team. Not for them the humdrum world of building some boring commercial website, instead they’ll build a massively flexible website creation framework that through the magic of code generation will immediately fill source control with a gazillion lines of unmaintainable garbage. Of course, that’s assuming it works, or that they even finish it – which is never guaranteed.

There’s a certain kudos to being the tools guy on the team: you don’t want the most junior member of the team creating tools that everyone else uses. If he screws up, his screw up will be amplified by the size of the team. Instead one of the smart developers will see a problem and start sharpening his tools; the trouble is you can spend an awful long time creating really sharp shears and somehow never get round to shaving the yak.

Backend Boys (and Girls)

Another common pull for a lot of developers is to get further down the stack, away from those messy, annoying users and nearer to the data. Here you can make problems more pure, really express your true artistry as a developer and an architect. It’s true: as you move down the stack you tend to find the real architectural meat of a system, where you want the developers who can see the big picture of how everything interacts. The seasoned professionals that understand scalability, availability and job-security.

It’s pretty easy to put off outsiders (project managers, customers, sniveling little front end developers) – you start drawing diagrams with lots of boxes and talk of enterprise grade messaging middleware and HATEOAS service infrastructure, before you know it their eyes have glazed over and they’ve forgotten what they were going to ask you: like perhaps why this has taken six months to build instead of six days?

GTD

Some developers just Get Things Done. Sure their methods might be a little… slapdash. But when you’re in a crunch (when aren’t you?) and you need something done yesterday, these are the people you want on your team. They won’t waste time designing a big complex architecture; they won’t even waste time writing automated tests. They’ll just hammer out some code and boom! problem solved.

Sometimes they can come across as heroes: they love nothing better than wading into a tough battle to show how fast they can turn things around. Of course, that also lets them quickly move from battlefield to battlefield, leaving others to clean up the dead and wounded they’ve left behind.

Front End Developers

For some reason Front End developers never seem to be considered the most senior. As though hacking WPF or HTML/CSS was somehow less worthy. In fact, I think the front end is the most important part – it’s where all your wonderful n-tier architecture and multiple redundant geegaws finally meets users. And without users, everything else is just intellectual masturbation.

The front end developers are responsible for the user experience. If they make a mess, the product looks like crap, works like crap: it’s crap. But if the front end developers create a compelling, easy to use application – it’s the great, scalable architecture underneath that made it all possible. Obviously.

Team Lead

Your team lead probably isn’t a senior developer. Sorry, bro: if you’re not coding you can’t call yourself anything developer. Go easy on your team lead though: the poor sod probably wrote code once. He probably enjoyed it, too. Then some suit decided that because he was good at one job, he should stop doing that and instead spend his life in meetings, explaining to people in suits why the product he’s not writing code for is late.

Architect

Your architect probably isn’t a senior developer either. Unless he’s actually writing code. In which case, why does he need the label “architect”? Architecture is a team responsibility. Sure, the most senior guy on the team is likely to have loads of experience and opinions to share with the team – but it doesn’t mean his pronouncements should be followed like scripture. But if instead of writing code you spend your time drawing pretty pictures of your scalable messaging middleware one more time, I will shove it up your enterprise service bus.

Conclusion

There are lots of different types of senior developer. That’s probably why the term has got so devalued. Once you’ve been in the industry for a few years, you’ll have found yourself in at least one of these roles and can immediately call yourself senior. The truth is you spend your whole life learning, only in an industry this young and naive could someone with 3 years experience be called “senior”. I’ve been programming professionally for 13 years and I’m only just starting to think I’m getting my head around it. I’m sure next year I’ll realise I’m an idiot and there’s a whole new level to learn.

So, go ahead, call yourself senior developer. Just make sure you keep on learning. Change jobs, wear a different hat. Be the tools guy. Meet like-minded developers. Play with different technologies. Become a middle tier developer. Then switch to work on user experience.

Senior developer: it’s just a job title, after all.

Read Full Post »

Where I work we have an unusual split: although the developers use C#, we also work with engineers who use Matlab. This allows the engineers to work in a simple, mathematics friendly environment where working with volumes of data in matrices is normal. Whereas the developers work with real computer code in a proper language. We get to look after all the data lifting, creating a compelling user interface and all the requisite plumbing. Overall it creates a good split. Algorithms are better developed in a language that makes it easy to prototype and express complex algorithms; while dealing with databases, networks and users is best done in a proper object oriented language with support for a compile time checked type system.

As part of a recent project for the first time we had a lot of data flowing through Matlab. This led to us have to verify that our matlab code, and our integration with it from C#, was going to perform in a reasonable time. Once we had a basic working integrated system, we began performance testing. The initial results weren’t great: we were feeding the system with data at something approximating a realistic rate (call it 100 blocks/second). But the code wasn’t executing fast enough, we were falling behind at a rate of about 3 blocks/second. This was all getting buffered in memory, so it was manageable but creates a memory impact over time.

So we began scaling back the load: we tried putting 10 blocks/second through. But the strangest thing: now we were only averaging 9.5 blocks/second being processed. WTF? So we scaled back further, 1 block/second. Now we were only processing 0.9 blocks / second. What in the hell was going on?

MatlabPerf

Let’s plot time to process batches of each of the three sizes we tried (blue line), and a “typical” estimate based on linear scaling (orange line). I.e. normally you expect twice the data to take twice as long to process. But what we were seeing was nearly constant time processing. As we increased the amount of data, the time to process it didn’t seem to change?!

We checked and double checked. This was a hand-rolled performance test framework (always a bad idea, but due to some technical limitations it was a necessary compromise). We unpicked everything. It had to be a problem in our code. It had to be a problem in our measurements. There was no way this could be realistic – you can’t put 1% of the load through a piece of code and it perform at, basically, 1% of the speed. What were we missing?

Then the engineers spotted a performance tweak to optimise how memory is allocated in Matlab, which massively speeded things up. Suddenly it all started to become clear: what we were seeing was genuine, the more data we passed into Matlab, the faster it processed each item. The elapsed time to process one block of data was nearly the same as the elapsed time to process 100 blocks of data.

Partly this is because the cost of the transition into Matlab from C# isn’t cheap, these “fixed costs” don’t change depending whether you have a batch size of 1 or 100. We reckon that accounted for no more than 300ms of every 1 second of processing. But what about the rest? It seems that because of the way Matlab is designed to work on matrices of data: the elapsed time to process one matrix is roughly the same, regardless of size. No doubt this is due to Matlab’s ability to use multiple cores to process in parallel and other such jiggery pokery. But the conclusion, for me, was incredibly counter intuitive: the runtime performance of matlab code does not appreciably vary based on the size of the data set!

Then the really counter intuitive conclusion: if we can process 100 blocks of data in 500ms (twice as fast as we expect them to arrive), we could instead buffer and wait and process 1000 blocks of data in 550ms. To maximize our throughput, it actually makes sense to buffer more data so that each call processes more data. I’ve never worked with code where you could increase throughput by waiting.

Matlab: it’s weird stuff.

Read Full Post »

Older Posts »

%d bloggers like this: