Are integration tests worth the hassle?

Whether or not you write integration tests can be a religious argument: either you believe in them or you don’t. What we even mean by integration tests can lead to an endless semantic argument.

What do you mean?

Unit tests are easy to define they test a single unit: a single class, a single method, make a single assertion on the behaviour of that method. You probably need mocks (again, depending on your religious views on mocking).

Integration tests, as fas as I’m concerned, mean they test a deployed (or at least deployable) version of your code, outside in, as close to what your “user” will do as possible. If you’re building a website, use Selenium WebDriver. If you’re writing a web service, write a test client and make requests to a running instance of your service. Get as far outside your code as you reasonably can to mimic what your user will do, and do that. Test that your code, when integrated, actually works.

In between these two extremes exist varying degrees of mess, which some people call integration testing. E.g. testing a web service by instantiating your request handler class and passing a request to it programmatically, letting it run through to the database. This is definitely not unit testing, as it’s hitting the database. But, it’s not a complete integration test, as it misses a layer: what if HTTP requests to your service never get routed to your handler, how would you know?

What’s the problem then?

Integration tests are slow. By definition, you’re interacting with a running application which you have to spin up, setup, interact with, tear down and clean up afterwards. You’re never going to get the speed you do with unit tests. I’ve just started playing with NCrunch, a background test runner for Visual Studio – which is great, but you can’t get it running your slow, expensive integration tests all the time. If your unit tests take 30 seconds to run, I’ll bet you run them before every checkin. If your integration tests take 20 minutes to run, I bet you don’t run them.

You can end up duplicating lower level tests. If you’re following a typical two level approach of writing a failing integration test, then writing unit tests that fail then pass until eventually your integration test passes – there is an inevitable overlap between the integration test and what the unit tests cover. This is expected and by design, but can seem like repetition. When your functionality changes, you’ll have at least two tests to change.

They aren’t always easy to write. If you have a specific case to test, you’ll need to setup the environment exactly right. If your application interacts with other services / systems you’ll have to stub them so you can provide canned data. This may be non-trivial. The biggest cost, in most environments I’ve worked in, with setting up good integration tests is doing all the necessary evil of setting up test infrastructure: faking out web services, third parties, messaging systems, databases blah blah. It all takes time and maintenance and slows down your development process.

Finally integration tests can end up covering uninteresting parts of the application repeatedly, meaning some changes are spectacularly expensive in terms of updating the tests. For example, if your application has a central menu system and you change it, how many test cases need to change? If your website has a login form and you massively change the process, how many test cases require a logged in user?

Using patterns like the page object pattern you can code your tests to minimize this, but it’s not always easy to avoid this class of failure entirely. I’ve worked in too many companies where, even with the best of intentions, the integration tests end up locking in a certain way of working that you either stick with or declare bankruptcy and just delete the failing tests.

What are the advantages then?

Integration tests give you confidence that your application actually works from your user’s perspective. I’d never recommend covering every possible edge case with integration tests – but a happy-path test for a piece of functionality and a failure-case gives you good confidence that the most basic aspects of any given feature work. The complex edge cases you can unit test, but an overall integration test helps you ensure that the feature is basically integrated and you haven’t missed something obvious that unit tests wouldn’t cover.

Your integration tests can be pretty close to acceptance tests. If you’re using a BDD type approach, you should end up with quite readable test definitions that sufficiently technical users could understand. This helps you validate that the basic functionality is as the user expects, not just that it works to what you expected.

What goes wrong?

The trouble is if integration tests are hard to write you won’t write them. You’ll find another piece of test infrastructure you need to invest in, decide it isn’t worth it this time and skip it. If your approach relies on integration tests to get decent coverage of parts of your application – especially true for the UI layer – then skipping them means you can end up with a lot less coverage than you’d like.

Some time ago I was working on a WPF desktop application – I wanted to write integration tests for it. The different libraries for testing WPF applications are basically all crap. Each one of them failed to meet my needs in some annoying, critical way. What I wanted was WebDriver for WPF. So I started writing one. The trouble is, the vagaries of the Windows UI eventing system mean this is hard. After a lot of time spent investing in test infrastructure instead of writing integration tests, I still had a barely usable testing framework that left all sorts of common cases untestable.

Because I couldn’t write integration tests and unit testing WPF UI code can be hard, I’d only unit test the most core internal functionality – this left vast sections of the WPF UI layer untested. Eventually, it became clear this wasn’t acceptable and we returned to the old-school approach of writing unit tests (and unit tests alone) to get as close to 100% coverage as is practical when some of your source code is written in XML.

This brings us back full circle: we have good unit test coverage for a feature, but no integration tests to verify that all the different units are hanging together correctly and work in a deployed application. But, where the trade-off is little test coverage or decent test coverage with systematic blindspots what’s the best alternative?


Should you write integration tests? If you can, easily: yes! If you’re writing a web service, it’s much easier to write integration tests for than almost every other type of application. If you’re writing a relatively traditional, not too-javascript-heavy website, WebDriver is awesome (and the only practical way to get some decent cross-browser confidence). If you’re writing very complex UI code (WPF or JavaScript) it might be very hard to write decent integration tests.

This is where your test approach blurs with architecture: as much as possible, your architecture needs to make testing easy. Subtle changes to how you structure your application might make it easier to get decent test coverage: you can design the application to make it easy to test different elements in isolation (e.g. separate UI layer from a business logic service); you don’t get quite fully integrated tests, but you minimize the opportunity for bugs to slip through the cracks.

Whether or not you write integration tests is fundamentally a question of what tests your architectural choices require you to write to get confidence in your code.

Matlab Performance Testing

Where I work we have an unusual split: although the developers use C#, we also work with engineers who use Matlab. This allows the engineers to work in a simple, mathematics friendly environment where working with volumes of data in matrices is normal. Whereas the developers work with real computer code in a proper language. We get to look after all the data lifting, creating a compelling user interface and all the requisite plumbing. Overall it creates a good split. Algorithms are better developed in a language that makes it easy to prototype and express complex algorithms; while dealing with databases, networks and users is best done in a proper object oriented language with support for a compile time checked type system.

As part of a recent project for the first time we had a lot of data flowing through Matlab. This led to us have to verify that our matlab code, and our integration with it from C#, was going to perform in a reasonable time. Once we had a basic working integrated system, we began performance testing. The initial results weren’t great: we were feeding the system with data at something approximating a realistic rate (call it 100 blocks/second). But the code wasn’t executing fast enough, we were falling behind at a rate of about 3 blocks/second. This was all getting buffered in memory, so it was manageable but creates a memory impact over time.

So we began scaling back the load: we tried putting 10 blocks/second through. But the strangest thing: now we were only averaging 9.5 blocks/second being processed. WTF? So we scaled back further, 1 block/second. Now we were only processing 0.9 blocks / second. What in the hell was going on?


Let’s plot time to process batches of each of the three sizes we tried (blue line), and a “typical” estimate based on linear scaling (orange line). I.e. normally you expect twice the data to take twice as long to process. But what we were seeing was nearly constant time processing. As we increased the amount of data, the time to process it didn’t seem to change?!

We checked and double checked. This was a hand-rolled performance test framework (always a bad idea, but due to some technical limitations it was a necessary compromise). We unpicked everything. It had to be a problem in our code. It had to be a problem in our measurements. There was no way this could be realistic – you can’t put 1% of the load through a piece of code and it perform at, basically, 1% of the speed. What were we missing?

Then the engineers spotted a performance tweak to optimise how memory is allocated in Matlab, which massively speeded things up. Suddenly it all started to become clear: what we were seeing was genuine, the more data we passed into Matlab, the faster it processed each item. The elapsed time to process one block of data was nearly the same as the elapsed time to process 100 blocks of data.

Partly this is because the cost of the transition into Matlab from C# isn’t cheap, these “fixed costs” don’t change depending whether you have a batch size of 1 or 100. We reckon that accounted for no more than 300ms of every 1 second of processing. But what about the rest? It seems that because of the way Matlab is designed to work on matrices of data: the elapsed time to process one matrix is roughly the same, regardless of size. No doubt this is due to Matlab’s ability to use multiple cores to process in parallel and other such jiggery pokery. But the conclusion, for me, was incredibly counter intuitive: the runtime performance of matlab code does not appreciably vary based on the size of the data set!

Then the really counter intuitive conclusion: if we can process 100 blocks of data in 500ms (twice as fast as we expect them to arrive), we could instead buffer and wait and process 1000 blocks of data in 550ms. To maximize our throughput, it actually makes sense to buffer more data so that each call processes more data. I’ve never worked with code where you could increase throughput by waiting.

Matlab: it’s weird stuff.

Testing asynchronous applications with WebDriverWait

Want to learn more about WebDriver? What do you want to know?

If you’re testing a web application, you can’t go far wrong with Selenium WebDriver. But in this web 2.0 world of ajax-y goodness, it can be a pain dealing with the asynchronous nature of modern sites. Back when all we had was web 1.0 you clicked a button and eventually you got a new page, or if you were unlucky: an error message. But now when you click links all sorts of funky things happen – some of which happen faster than others. From the user’s perspective this creates a great UI. But if you’re trying to automate testing this you can get all sorts of horrible race conditions.


The naive approach is to write your tests the same way you did before: you click buttons and assert that what you expected to happen actually happened. For the most part, this works. Sites are normally fast enough, even in a continuous integration environment, that by the time the test harness looks for a change it’s already happened.

But then… things slow down a little and you start getting flickers – tests that sometimes pass and sometimes fail. So you add a little delay. Just 500 milliseconds should do it, while you wait for the server to respond and update the page. Then a month later it’s flickering again, so you make it 1 second. Then two… then twenty.

The trouble is, each test runs at the pace that it runs at its slowest. If login normally takes 0.1 seconds, but sometimes takes 10 seconds when the environment’s overloaded – the test has to wait for 10 seconds so as not to flicker. This means even though the app often runs faster,  the test has to wait just in case.

Before you know it, your tests are crawling and take hours to run – you’ve lost your fast feedback loop and developers no longer trust the tests.

An Example

Thankfully WebDriver has a solution to this. It allows you to wait for some condition to pass, so you can use it to control the pace of your tests. To demonstrate this, I’ve created a simple web application with a login form – the source is available on github. The login takes a stupid amount of time, so the tests need to react to this so as not to introduce arbitrary waits.

The application is very simple – a username and password field with an authenticate button that makes an ajax request to log the user in. If the login is successful, we update the screen to let the user know.

The first thing is to write our test (obviously in the real world we’d have written the test before our production code, but its the test that’s interesting here not what we’re testing – so we’ll do it in the wrong order just this once):

public void authenticatesUser()

    LoginPage loginPage =;
    Assert.assertEquals("Logged in as admin", loginPage.welcomeMessage());

We have a page object that encapsulates the login functionality. We provide the username & password then click authenticate. Finally we check that the page has updated with the user message. But how have we dealt with the asynchronous nature of this application?


Through the magic of WebDriverWait we can wait for a function to return true before we continue:

public void clickAuthenticate() {;
    new WebDriverWait(driver, 30).until(accountPanelIsVisible());

private Predicate<WebDriver> accountPanelIsVisible() {
    return new Predicate<WebDriver>() {
        @Override public boolean apply(WebDriver driver) {
            return isAccountPanelVisible();
private boolean isAccountPanelVisible() {
    return accountPanel.isDisplayed();

Our clickAuthenticate method clicks the button then instructs WebDriver to wait for our condition to pass. The condition is defined via a predicate (c’mon Java where’s the closures?). The predicate is simply a method that will run to determine whether or not the condition is true yet. In this case, we delegate to the isAccountPanelVisible method on the page object. This does exactly what it says on the tin, it uses the page element to check whether it’s visible yet. Simple, no?

In this way we can define a condition we want to be true before we continue. In this case, the exit condition of the clickAuthenticate method is that the asynchronous authentication process has completed. This means that tests don’t need to worry about the internal mechanics of the page – about whether the operation is asynchronous or not. The test merely specifies what to test, the page object encapsulates how to do it.


It’s all well and good waiting for elements to be visible or certain text to be present, but sometimes we might want more subtle control. A good approach is to update Javascript state when an action has finished. This means that tests can inspect javascript variables to determine whether something has completed or not – allowing very clear and simple coordination between production code and test.

Continuing with our login example, instead of relying on a <div> becoming visible, we could instead have set a Javascript variable. The code in fact does both, so we can have two tests. The second looks as follows:

public void authenticate() {;
    new WebDriverWait(driver, 30).until(authenticated());

private Predicate<WebDriver> authenticated() {
    return new Predicate<WebDriver>() {
        @Override public boolean apply(WebDriver driver) {
            return isAuthenticated();

private boolean isAuthenticated() {
    return (Boolean) executor().executeScript("return authenticated;");
private JavascriptExecutor executor() {
    return (JavascriptExecutor) driver;

This example follows the same basic pattern as the test before, but we use a different predicate. Instead of checking whether an element is visible or not, we instead get the status of a Javascript variable. We can do this because each WebDriver also implements the JavascriptExecutor allowing us to run Javascript inside the browser within the context of the test. I.e. the script “return authenticated” runs within the browser, but the result is returned to our test. We simply inspect the state of a variable, which is false initially and set to true once the authentication process has finished.

This allows us to closely coordinate our production and test code without the risk of flickering tests because of race conditions.

Testing asynchronous applications with WebDriver

[tweetmeme source=”activelylazy” only_single=false]

Note: this article is now out of date, please see the more recent version covering the same topic: Testing asynchronous applications with WebDriverWait.

WebDriver is a great framework for automated testing of web applications. It learns the lessons of frameworks like selenium and provides a clean, clear API to test applications. However, testing ajax applications presents challenges to any test framework. How does WebDriver help us? First, some background…


WebDriver comes with a number of drivers. Each of these is tuned to drive a specific browser (IE, Firefox, Chrome) using different technology depending on the browser. This allows the driver to operate in a manner that suits the browser, while keeping a consistent API so that test code doesn’t need to know which type of driver/browser is being used.

Page Pattern

The page pattern provides a great way to separate test implementation (how to drive the page) from test specification (the logical actions we want to complete – e.g. enter data in a form, navigate to another page etc). This makes the intent of tests clear:


Blocking Calls

WebDriver’s calls are also blocking – calls to submit(), for example, wait for the form to be submitted and a response returned to the browser. This means we don’t need to do anything special to wait for the next page to load:

WebDriver driver = new FirefoxDriver();
WebElement e = driver.findElement("q"));
assertEquals("webdriver - Google Search",driver.getTitle());

Asynchronous Calls

The trouble arises if you have an asynchronous application. Because WebDriver doesn’t block when you make asynchronous calls via javascript (why would it?), how do you know when something is “ready” to test? So if you have a link that, when clicked, does some ajax magic in the background – how do you know when the magic has stopped and you can start verifying that the right thing happened?

Naive Solution

The simplest solution is by using Thread.sleep(…). Normally if you sleep for a bit, by the time the thread wakes up, the javascript will have completed. This tends to work, for the most part.

The problem becomes that as you end up with hundreds of these tests, you suddenly start to find that they’re failing, at random, because the delay isn’t quite enough. When the build runs on your build server, sometimes the load is higher, the phase of the moon wrong, whatever – the end result is that your delay isn’t quite enough and you start assert()ing before your ajax call has completed and your test fails.

So, you start increasing the timeout. From 1 second. To 2 seconds. To 5 seconds. You’re now on a slippery slope of trying to tune how long the tests take to run against how successful they are. This is a crap tradeoff. You want very fast tests that always pass. Not semi fast tests that sometimes pass.

What’s to be done? Here are two techniques that make testing asynchronous applications easier.


All the drivers (except HtmlUnit, which isn’t really driving a browser) actually generate RenderedWebElement instances, not just WebElement instances. RenderedWebElement has a few interesting methods on it that can make testing your application easier. For example, the isDisplayed() method saves you having to query the CSS style to work out whether an element is actually shown.

If you have some ajax magic that, in its final step, makes a <DIV> visible then you can use isDisplayed() to check whether the asynchronous call has completed yet.

Note: my examples here use dojo but the same technique can be used whether you’re using jquery or any other asynchronous toolkit.

First the HTML page – this simply has a link and a (initially hidden) <div>. When the link is clicked it triggers an asynchronous call to load a new HTML page; the content of this HTML page is inserted as the content of the div and the div made visible.


 <script type="text/javascript" src="js/dojo/dojo.js" djConfig=" isDebug:false, parseOnLoad:true"></script>

 <script src="js/asyncError.js" type="text/javascript"></script>

    * Load a HTML page asynchronously and
    * display contents in hidden div
   function loadAsyncContent() {
     var xhrArgs = {
       url: "asyncContent.htm",
       handleAs: "text",
       load: function(data) {
         // Update DIV with content we loaded
         dojo.byId("asyncContent").innerHTML = data;

         // Make our DIV visible
         dojo.byId("asyncContent").style.display = 'block';

     // Call the asynchronous xhrGet
     var deferred = dojo.xhrGet(xhrArgs);
 <div id="asyncContent" style="display: none;"></div>
 <a href="#" id="loadAsyncContent" onClick="loadAsyncContent();">Click to load async content</a>

Now the integration test. Our test simply loads the page, clicks the link and waits for the asynchronous call to complete (highlighted). Once the call is complete, we check that the contents of the <div> is what we expect.

public class TestIntegrationTest {
 public void testLoadAsyncContent() {
   // Create the page
   TestPage page = new TestPage( new FirefoxDriver() );

   // Click the link

   // Confirm content is loaded
   assertEquals("This content is loaded asynchronously.",page.getAsyncContent());

Now the page class, used by the integration test. This interacts with the HTML elements exposed by the driver; the waitForAsyncContent method regularly polls the <div> to check whether its been made visible yet (highlighted)

public class TestPage  {

 private WebDriver driver;

 public TestPage( WebDriver driver ) {
   this.driver = driver;

   // Load our page

 public void clickLoadAsyncContent() {

 public void waitForAsyncContent() {
   // Get a RenderedWebElement corresponding to our div
   RenderedWebElement e = (RenderedWebElement) driver.findElement("asyncContent"));

   // Up to 10 times
   for( int i=0; i<10; i++ ) {
     // Check whether our element is visible yet
     if( e.isDisplayed() ) {

     try {
     } catch( InterruptedException ex ) {
       // Try again

 public String getAsyncContent() {
   return driver.findElement("asyncContent")).getText();

By doing this, we don’t need to code arbitrary delays into our test; we can cope with server calls that potentially take a little while to execute; and can ensure that our test will always pass (at least, tests won’t fail because of timing problems!)


Another trick is to realise that the WebDriver instances implement JavascriptExecutor. This interface can be used to execute arbitrary Javascript within the context of the browser (a trick selenium finds much easier). This allows us to use the state of javascript variables to control the test. For example – we can have a variable populated once some asynchronous action has completed; this can be the trigger for the test to continue.

First, we add some Javascript to our example page above. Much as before, it simply loads a new page and updates the <div> with this content. This time, rather than show the div, it simply sets a variable – “working” – so the div can be visible already.

 var working = false;

 function updateAsyncContent() {
   working = true;

   var xhrArgs = {
     url: "asyncContentUpdated.htm",
     handleAs: "text",
     load: function(data) {
       // Update our div with our new content
       dojo.byId("asyncContent").innerHTML = data;

       // Set the variable to indicate we're done
       working = false;

   // Call the asynchronous xhrGet
   var deferred = dojo.xhrGet(xhrArgs);

Now the extra test case we add. The key line here is where we wait for the “working” variable to be false (highlighted):

 public void testUpdateAsyncContent() {
   // Create the page
   TestPage page = new TestPage( getDriver() );

   // Click the link

   // Now update the content

   // Confirm content is loaded
   assertEquals("This is the updated asynchronously loaded content.",page.getAsyncContent());

Finally our updated page class, the key line here is where we execute some Javascript to determine the value of the working variable (highlighted):

 public void clickUpdateAsyncContent() {

 public void waitForNotWorking() {
   // Get a JavascriptExecutor
   JavascriptExecutor exec = (JavascriptExecutor) driver;

   // 10 times, or until element is visible
   for( int i=0; i<10; i++ ) {

     if( ! (Boolean) exec.executeScript("return working") ) {

     try {
     } catch( InterruptedException ex ) {
       // Try again

Note how WebDriver handles type conversions for us here. In this example the Javascript is relatively trivial, but we could execute any arbitrarily complex Javascript here.

By doing this, we can now flag in Javascript, when some state has been reached that allows the test to progress; we have made our code more testable, and we have eliminated arbitrary delays and our test code runs faster by being able to be more responsive to asynchronous calls completing.

Have you ever had problems testing asynchronous applications? What approaches have you used?