mdub@DogBiscuit.org
... mmm, crunchy!
about - weblog - software - resume - email - pgp

Attacking slow continuous-integration builds (notes from CITCON)

Last weekend I went along to CITCON here in Melbourne. Which was great fun, by the way.

There I ran a session on "Attacking slow CI builds". It was a small group, but an interesting discussion, I think. Here are my (rough, unedited) notes:

WHAT is the impact of a slow build?

  • fewer checkins
  • more waiting
  • context switching
  • discourages integration
  • discourages writing of additional tests
  • more chance of overlapping checkins
  • more build breakages
  • more time required to get the build fixed
  • reduced productivity
  • WASTE!

WHY is the build slow?

  • slow tests (particularly acceptance tests)
    • over-testing (testing the same code-paths repeatedly)
    • expensive set-up and tear-down
    • too much testing via the user-interface
    • tests that pause, sleep, or poll (e.g. to deal with AJAX)
  • too much I/O!
  • use of slow infrastructure components (database servers, application servers, etc.)
  • slow hardware

HOW can we make it faster?

  • faster hardware
  • run tests in parallel
  • distribute tests
  • fail fast
    • selective testing: run tests most likely to fail first
      • could use dependency-analysis to identify which tests were affected by recent commits
  • refactor story-based acceptance tests into scenario-based tests
    • bigger tests, with more assertions, offsets set-up/tear-down costs
      • but makes tests more complex
  • share test fixtures between a group of tests
    • but breaks test isolation
  • avoid I/O
    • in-memory database
    • in-memory file-store (RAM disk?)
    • stub out infrastructure components
      • avoid testing these components by side-effect
  • populate the database directly, rather than using the user-interface to set-up for a test
  • separate your system into components that can be tested independently

Thinking about this later ...

There are two types ...

The suggestions for improving build times seemed to fall into two categories:

  1. optimise the build/tests
  2. throw additional hardware at the problem

My problem with the "throw hardware at it" approach is that it typically only helps for the continuous-integration server; the poor old developers are still left with a slow-running build, and therefore many of the productivity issues still exist.

Another idea

It occurs to me now that we missed a fairly fundamental trick to improve test times: improve the performance of the system-under-test itself. It's a great excuse to start thinking about performance earlier in the project.

"Customer Acceptance Test" does not need to mean end-to-end

On all the projects I've been on in recent years, we've ended up with the majority of the tests being either "developer unit tests", which run super-fast, or "customer acceptance tests" which test end-to-end (browser-to-database) and run super-slow.

Methinks it should be less black-and-white. If we can demonstrate functionality that the customer cares about by calling the underlying logic directly (i.e. at unit-test level), rather than by exercising the user-interface, then what's wrong with that? (We just need one test to prove that the underlying logic has been properly integrated into the UI.)

ReadOnlyFormBuilder

For RubyOnRails developers, form_for and fields_for are the accepted way of DRYing up form templates. You know the deal; you code

<% form_for :customer, :url => customers_path() do |customer_form| %>
  <p>
    <label>Name:</label> 
    <%= customer_form.text_field :first_name, :size => 15 %>
    <%= customer_form.text_field :last_name, :size => 20 %>
  </p>
  ... etc ...
<% end %>

and you get

<form action="/customers" method="post">
  <p>
    <label>Name:</label> 
    <input id="customer_first_name" name="customer[first_name]" size="15" type="text" />
    <input id="customer_last_name" name="customer[last_name]" size="20" type="text" value="" />
  </p>
  ... etc ...
</form>

Rails generates sensible field names and ids for you, and slurps existing values out of the model object. So far, so good.

Lately, I've taken to using the same trick when presenting data, not just when editing it. So, whereas before I might have written:

  <p>
    <label>Name:</label> 
    <span id="customer_first_name"><%= h @customer.first_name %></span>
    <span id="customer_last_name"><%= h @customer.last_name %></span>
  </p>
  ... etc ...

I'll now code it up as:

<% fields_for :customer, :builder => ReadOnlyFormBuilder do |customer_form| %>
  <p>
    <label>Name:</label> 
    <%= customer_form.text_field :first_name, :size => 15 %>
    <%= customer_form.text_field :last_name, :size => 20 %>
  </p>
  ... etc ...
<% end %>

and get the same output. (In case you're wondering, the ids are there to help with automated testing).

Note the similarity between the last code snippet and the first one on this page; apart from the first line they're indentical. Usually, I'll put the field-declarations themselves in a partial that's shared between "new", "edit" and "show" actions. That way, your "show" page automatically gets identical layout to the others, just with raw values in place of editable fields.

The ReadOnlyFormBuilder class itself it fairly straightforward - I'm planning to wrap it up into a plugin sometime soon. In the meantime, the implementation of text_field looks something like this:

def text_field(attribute, options={})
  content_tag("span", html_escape(value_of(attribute)), :id => "#{@object_name}_#{attribute}")
end

def value_of(attribute)
  value = model.send(attribute)
end

def model
  @object || @template.instance_variable_get("@#{@object_name}")
end

Rake profiling

Where's the bottleneck in your Rake build? Let's find out. Drop (or include) this in your Rakefile:

module Rake
  class Task
    def execute_with_timestamps(*args)
      start = Time.now
      execute_without_timestamps(*args)
      execution_time_in_seconds = Time.now - start
      printf("** %s took %.1f seconds\n", name, execution_time_in_seconds)
    end
    
    alias :execute_without_timestamps :execute
    alias :execute :execute_with_timestamps 
  end
end

Ant build tips

During my past few Java projects, I've developed some guidelines which I find make builds faster, more reliable and easier to maintain. The details are specific to Ant, but hopefully the principles are transferrable to other software build systems.

These ideas may seem blindingly obvious to some readers, but I suspect they'll appear new-and-strange, and perhaps even bad-and-wrong, to others. In any event, I hope to trigger some thought/discussion.

Principles

My build approach is based on two simple principles:

  • Efficiency - don't rebuild up-to-date outputs
  • Safety - do rebuild out-of-date outputs

(By "output", I mean some artifact produced by the build. I'm avoiding the word "target" here, since it has specific meaning in Ant.)

Efficiency - DON'T rebuild up-to-date outputs

Quick builds, and rapid feedback, are important for developer productivity. Using a build system that recreates everything from scratch after even a minor change is a great way to kill productivity.

Re-executing a single build step is typically not the end of the world, but many outputs are also inputs to other build steps, so unnecessarily rebuilding an output early on during the build can trigger rework all the way through.

Safety - DO rebuild out-of-date outputs

On the flip side, when a key input DOES change, you need to ensure that all the derived outputs are rebuilt, or at least revalidated. Otherwise, your build becomes "flaky" and unpredictable.

A flaky build forces developers to compensate somehow, e.g. by explicitly running "clean" builds every time, whch impacts productivity.

Tips

Explicitly declare dependencies between your targets

Some people are reluctant to declare dependencies, because declaring them introduces overhead. But not doing so is unsafe, because it opens the door to build steps being executed with stale inputs, resulting in confusing, frustrating, non-deterministic build behaviour.

If you've followed the "Don't rebuild up-to-date outputs" rule, then dependencies should be safe/cheap, ie. there's minimal overhead, and no reason not to declare them.

Targets should be Nouns, not Verbs

Typically, programmers name Ant targets by what they do, e.g. "compile", "test". However, this tends to produce very procedural builds.

So instead, I recommend choosing names describing what the target produces, e.g. "classes", "test/report". Perhaps it's just because I spent so many years automating builds using make, but I find that such noun-ish targets help in various ways:

  • it's easier to understand what outputs each target produces (for obvious reasons)
  • intermediate targets tend to become useful in their own right
  • dependencies become clearer, as it makes more sense to depend on a concrete input, rather than a process

If you've read this far, go read Martin Fowler's "OutputBuildTarget" article; he explores the subject more eloquently than I'm capable of.

Some targets might not produce a concrete artifact (or the artifact might not be the main point of the target). In such cases, I'll sometimes name them based on the condition they produce, or ensure. For example, a target using Simian to check for duplication might be called "minimal-duplication" (as opposed to "simian").

Use <uptodate> to avoid unnecessary rework

Most Ant tasks include dependency-checking based on file timestamps, and will avoid rework. But some tasks aren't so clever. For instance, the <junit> task will happily re-run all your tests, even if they all passed last time, and neither code not tests have changed.

The <uptodate> task can help fill the gap. It compares the timestamps of specified input and output files, and sets a property indicating that work can be avoided.

Here's an example where <uptodate> is used to avoid unnecessary re-generation of XML-mapping code:

<target name="xml-module/check"
        depends="properties">
    <uptodate property="xml-module.uptodate"
              targetfile="${xml-module.jar}">
        <srcfiles dir="spec" includes="**/*.xsd"/>
    </uptodate>
</target>

<target name="xml-module"
        depends="xml-module/check, xmlbean/taskdef"
        unless="xml-module.uptodate">
    <xmlbean destfile="${xml-module.jar}"
             classpathref="xmlbeans.classpath">
        <fileset dir="spec" includes="**/*.xsd"/>
    </xmlbean>
</target>

Use <touch> to record a completed task

Although it's unusual, some build steps have no output: they are simply processes that must be executed, e.g. validating the format of a file, or verifying adherence to coding standards (Checkstyle, Simian). Other build steps can produce many outputs, e.g. code-generation tools.

In these cases, where there's no identifiable primary output, it can be useful to invent a placeholder output-file using Ant's <touch> task. The resulting file is empty, but it's timestamp can be used for dependency-checking, to determine if/when the build step needs to be re-run.

<touch> is most useful in conjunction with <uptodate>, as in the following example:

<target name="libs/check">
    <uptodate property="libs.uptodate">
        <srcfiles dir="." includes="ivy.xml"/>
        <mapper type="merge" to="lib/.done"/>
    </uptodate>
</target>

<target name="libs" description="retrieve dependencies with ivy"
        depends="libs/check" unless="libs.uptodate">
    <ivy:retrieve pattern="lib/[conf]/[artifact].[ext]" />
    <touch file="lib/.done" />
</target>    

Here we're using Ivy to download third-party libraries. After download, we create a touch-file to mark the job as done. On subsequent runs, the library resolution and download process will be skipped, unless the "ivy.xml" control-file has been changed.

As I alluded to earlier, I have also used the combination of <touch> and <uptodate> to:

  • skip code-style checks when code hasn't changed
  • skip tests when neither code nor tests have changed

Use <dependset> to remove out-of-date outputs

When Ant is not clever enough to determine when something needs re-doing, the <dependset> task is useful for mopping up stale outputs.

Pitfalls

Avoid "private" targets

Many builds include "private" or "hidden" targets, that are unsafe to call directly. A common convention in the Ant world is name these targets starting with '-', since that makes them inaccessible from the command-line.

I think private targets are a smell: they indicate that implicit dependencies are present in the build. Hiding the unsafe targets makes sense, in a way ... but I much prefer to make the dependencies explicit, as described above, at which point it's safe to let every target be called directly (which often comes in handy when testing some aspect of the build process).

Avoid targets depending on "clean"

Having popular targets depend on "clean" is a bad smell. You DO need to avoid using artifacts from previous builds which have passed their use-by date, but starting the whole build from scratch is overkill, when proper dependencies and careful timestamp-checking can ensure that just the stale stuff is rebuilt.

Avoid <copy overwrite="true">

An anti-pattern I often encounter (and a pet peeve) is:

<copy overwrite="true" ...>
    ...
    <filterset>
        <filter token="PASSWORD" value="${db.password}"/>
        ...
    </filterset>
</copy>

The "overwrite" attribute causes Ant to copy files every time, ignoring the usual timestamp-checking that prevents re-generation of up-to-date files. Using "overwrite" can easily cause most of your jars/wars/ears/etc to be updated with every build.

Instead, use <dependset> to invalidate the outputs in the case that ${db.password} has changed.

See Also

method_missing magic - emulating Groovy's "it" in Ruby

Inspired variously by:

I've cooked up a shortcut for generating simple blocks, meaning that rather than

people.select { |x| x.name.length > 10 }

I can write such things as:

people.select(&its.name.length > 10)

Disclaimer: I think this is more "cool hack" than useful tool; it's probably too much of an alien artifact to be useful in real life. And it's not generally applicable, like "it" in Groovy. And really, it's not that much more verbose to use a block. Aaaaaanyway ...

The trick is that the above is parsed as

people.select(&(its.name.length.>(10)))

The "its" method creates a MessageBuffer object, which records the messages (method invocations) sent it's way:

irb(main):001:0> require 'message_buffer'
=> true
irb(main):002:0> its
=> #<MessageBuffer:0x6b40b44 @messages=[]>
irb(main):003:0> its.name.length < 10
=> #<MessageBuffer:0x6b3e678 @messages=[[:name], [:length], [:<, 10]]>

Now, the "&" operator coerces it's argument to a Proc, and MessageBuffer#to_proc generates a Proc that replays all the recorded messages. Q.E.D.

The full source-code is fairly short, so I'll include it inline:

class MessageBuffer 

  instance_methods.each do |m|
    undef_method m unless m =~ /^(__|respond_to|inspect)/ 
  end
  
  def initialize
    @messages = []
  end

  def method_missing(*message)
    @messages << message        # record the message
    self                        # return self so we can keep recording
  end
  
  def __replay_all_messages__(obj)
    @messages.inject(obj) do |obj, message|
      obj.__send__(*message)
    end 
  end
  
  def to_proc
    proc { |x| __replay_all_messages__(x) }
  end

end

def its
  MessageBuffer.new
end


Update: Florian Gross suggested a better way to replay recorded messages, using inject, and I've updated the code accordingly.

Presentation on Ruby/Rails at EJA

A couple of months ago I gave a presentation on Ruby and Rails to a local Java user-group. My slides are now online:

It contains a few examples showing how expressive Ruby can be, when compared to Java.

I hate "frameworks"

Give me a "toolkit" or "library" over a "framework" any hour of the day.

A software framework offers to solve 80% of my problem, but usually without understanding what my problem actually is.

A toolkit is collection of tools. I can pick them up and use them as I see fit. I can use individual tools/components, without needing to adopt them all. I can use them in conjunction with other tools I have, without voiding any warranties.

Grumble.

Tracing with a dynamic Proxy, in Ruby

Recently, I was writing a (Ruby) script to sync email between two IMAP servers. My unit-tests were all working, but something was going screwy when I plugged in a real server.

I wanted to be able to trace the conversation with the IMAP server (or at least, Ruby's IMAP API), to see what was going on. Initially, I started sprinkling tracing statements throughout my code, until I realised that it was going to be easier to define a simple "tracing proxy", and wrap it around the object I wanted to trace:

imap_handle = TracingProxy.new(imap_handle, $stderr)

# ... do stuff with imap_handle ...

It turned out to be straightforward to implement:

class TracingProxy

  def initialize(obj, dest) 
    @obj = obj
    @dest = dest
  end

  def method_missing(symbol, *args)
    arglist = args.map { |a| a.inspect }.join(', ')
    @dest.puts "#{symbol}(#{arglist})"
    rval = @obj.send(symbol, *args)
    @dest.puts ">> #{rval.inspect}"
    rval
  end
  
end

method_missing is a fallback method invoked when the called method isn't found - it's great for implementing dynamic proxies. There's nothing particularly ground-breaking going on here - this kind of trick is fairly common in Ruby-land.

My point is: implementing a dynamic-proxy for tracing was so easy in Ruby that I actually did it. I could have done something similar in Java, using java.lang.reflect.Proxy, or cglib - but I most likely wouldn't have bothered.

In Ruby, implementing the proxy made my life easier, not harder. Ruby encourages me to produce better designs.

Refactoring "support" for Ruby?

These days, there a number of pretty damn good IDEs for Java, with features like intelligent code-completion (aka "intellisense") and automated refactorings. I was a late-starter with IDEs, myself, but even just over the past year I've become annoyingly dependent on some of those IDE features.

Such features depend quite heavily on gleaning data-type information from the code, which is fine for languages like Java and C#. But in dynamically-typed languages like Ruby, we don't have that type info, so things like method-name completion and automated renaming become impossible. (Or so I thought).

Stealing a trick from SmallTalk

It's been puzzling me that there isn't better refactoring support for Ruby, given that the whole concept of refactoring grew out of the SmallTalk community, in the first place. Or more accurately, I've been confused about how automated refactoring could be possible in a dynamic language like SmallTalk.

Then, recently, I stumbled across a paper describing "A Refactoring Tool for Smalltalk", which contains the following explanation:

The Refactoring Browser uses method wrappers to collect runtime information. These wrappers are activated when the wrapped method is called and when it returns. The wrapper can execute an arbitrary block of Smalltalk code. To perform the rename method refactoring dynamically, the Refactoring Browser renames the initial method and then puts a method wrapper on the original method. As the program runs, the wrapper detects sites that call the original method. Whenever a call to the old method is detected, the method wrapper suspends execution of the program, goes up the call stack to the sender and changes the source code to refer to the new, renamed method. Therefore, as the program is exercised, it converges towards a correctly refactored program.

Ah-ha! Cunning.

The Ruby version

As it turns out, we can do much the same thing in Ruby ... leaving aside the "go up the call stack and change the source code" part.

Here's the supporting code:

def method_renamed(h)
  old_name = h.keys[0].to_sym
  new_name = h.values[0].to_sym
  define_method(old_name) { |*args|
    file, line = caller[1].split(':')
    warning = "##{old_name} renamed to ##{new_name}"
    $stderr.puts "#{file}:#{line}: #{warning}"
    send(new_name, *args)
  }
end

Okay, here's a method I want to rename:

class LinkPanel

  def render
    # ... 
  end

end

When I rename it, I also record the change using method_renamed:

class LinkPanel

  method_renamed :render => :to_html

  def to_html
    # ... 
  end

end

Now, I run my tests, and calls to the renamed method result in warnings:

/home/mikew/eyaw/sidebar.rb:229: #render renamed to #to_html

With a single key-chord in my Ruby IDE, I can jump directly to the source-code in question, and fix up the call. I imagine that an ever-so-slightly-more intelligent IDE could complete the refactoring, applying the rename to the call-site automatically! Later on, when I'm confident that everything has been cleaned up, I'll go back and remove that method_renamed alias.

There's more to refactoring than just renaming stuff, of course. I think the "dynamic analysis" trick would be useful to support other refactorings, too ... though I haven't tried it yet.

Proviso: this approach relies on actually running the code, preferably from tests. As the original paper says:

.. the refactoring is only as good as your test suite. If there are pieces of code that are not executed, they will never be analyzed, and the refactoring will not be completed ...

TestGroups for JUnit

New users of JUnit often assume that there will only be one instance of their TestCase class (I did, at first).

In fact, each test-method is represented by a separate instance of the test-class. This isolation of test-methods is actually pretty sensible, since it means that (from the horse's mouth)

... each test will run with a fresh fixture and the results of one test can't influence the result of another.

If your tests are truly unit-tests, then re-creating a fresh fixture for every method should be fairly cheap, so it's not a big deal. BUT, it's a slightly different story if you're using JUnit as a framework for acceptance tests, or integration tests, or any scenario in which creating the required fixture/resource objects is costly.

My problem

On my current project, we have a large suite of web-app acceptance-tests written using HtmlUnit. We starting off writing tests something like this:

public PolicySelectionScreenTest extends TestCase {
    public void setUp() throws Exception {
        expensiveSetUpCode();
    }
    public void testPolicyTypeDefaultsToStandard() {
        assertEquals("STD", screenFixture.getPolicyType());
    }
    public void testWindscreenOptionDefaultsToNo() {
        assertEquals("N", screenFixture.getWindscreenOption());
    }
}

It soon became obvious that re-running the expensiveSetUpCode() for each test was - well - expensive, so we starting looking for ways to reduce that overhead. An obvious way to do it is to bundle several asserts into the one test, e.g.

public PolicySelectionScreenTest extends TestCase {
    public void testInitialScreenStateIsCorrect() {
        expensiveSetUpCode();
        assertEquals("STD", screenFixture.getPolicyType());
        assertEquals("N", screenFixture.getWindscreenOption());
    }
}

There are a couple of problems with this, though:

  • Test-methods get bloaty, and their names become less informative. This isn't ideal, as I prefer short test-methods, with names that describe the intended behaviour.
  • Testing of a scenario may halt prematurely, when it could usefully run further and provide more feedback about what is or isn't working.

A solution

So, I developed a way of aggregating a number of related test-methods into a "TestGroup". Now our tests look more like this:

public PolicySelectionScreenTests extends TestGroup {
    public void groupSetUp() throws Exception {
        expensiveSetUpCode();
    }
    public void testPolicyTypeDefaultsToStandard() {
        assertEquals("STD", screenFixture.getPolicyType());
    }
    public void testWindscreenOptionDefaultsToNo() {
        assertEquals("N", screenFixture.getWindscreenOption());
    }
}

A TestGroup instance can be converted into JUnit-ese easily, by calling its asTest() method:

public static Test suite() {
    TestSuite suite = new TestSuite();
    // ... etc ...
    suite.addTest(new PolicySelectionScreenTests().asTest());
    return suite;
}

Alternatively, we have an extended TestSuite implementation that makes this a little easier:

public static Test suite() {
    TestSuite suite = new GroupAwareTestSuite();
    // ... etc ...
    suite.addTestSuite(PolicySelectionScreenTests.class);
    return suite;
}

Now, our original tests run faster (since the expensiveSetUpCode() is only run once), but the test-methods remain short and well-named. Woo-hoo! [cue weird little dance of joy].

But wait, there's more

As you might have guessed, there's a groupTearDown() to match groupSetUp(). The normal setUp() and tearDown() hooks are also supported, and run before/after each test, as you'd expect.

A warning

Once we start sharing test-fixtures like this, we're effectively removing JUnit's built-in safety harness, and thus running the risk of tests infecting the results of other tests by "polluting" the fixture. There's no easy solution: you just have to be really careful. Guidelines:

  • If possible, avoid putting any code that alters the state into the test-methods of a TestGroup.
  • If that's not possible, ensure you reset the fixture to a known state in the setUp() hook.

A peek inside

In my first attempt at TestGroups, I simply implemented the Test interface. Unfortunately, it's a fairly thin interface, and doesn't provide an API for navigating the hierarchical structure of a test-suite. If you want to explore the hierarchy, you'll have to assume that your test-suite will be constructed from TestCase and TestSuite objects - perhaps with the odd TestDecorator thrown in - and perform the required instanceof checks. If some new, unknown implementation of Test comes along, your assumptions are shot. Most IDEs are in this position, as they typically display the test-hierarchy. Thus, my original implementation didn't play nicely in an IDE environment.

So, instead, TestGroup.asTest() creates a structure that adapts the TestGroup to look like a TestSuite. The suite is wrapped by a TestSetup decorator that fires the groupSetUp() and groupTearDown() hooks. The TestCases in the suite are simple proxies that invoke methods on the shared TestGroup instance. Or, in pictures:

TestGroup

Because the result is just a aggregate of core JUnit objects, it doesn't confuse IDEs in the way I described earlier.

The code

If you're interested in using TestGroups, or just want to take a look at the code, you can get it here.