http://www.dogbiscuit.org/mdub/weblogDogBiscuit2013-07-02T09:48:00+10:00Mike Williamsmdub@dogbiscuit.orghttp://www.dogbiscuit.org/mdub/weblog/Tech/DevOps/CloudFoundry/DeployingAMicroBOSHCloud Foundry on AWS - Part 1: Deploying a MicroBOSH2013-07-02T09:48:00+10:002013-07-02T09:48:00+10:00<p>
Having spent some time <a href='PlayingWithCloudFoundryV2'>experimenting with Cloud Foundry</a> using Pivotal's flagship deployment at <a href='http://run.pivotal.io'><samp>run.pivotal.io</samp></a>, I wanted to see how easy it was to deploy Cloud Foundry myself.
</p>
<p>
The recommended deployment mechanism for Cloud Foundry uses <a href='http://docs.cloudfoundry.com/docs/running/bosh/'>BOSH</a>, "an open source tool chain for release engineering, deployment, and lifecycle management of large-scale distributed services". Once it's up and running, Cloud Foundry actually has no dependencies on BOSH, and in fact, it's technically possible to deploy CF by <a href='https://github.com/Altoros/cf-vagrant-installer'>other</a> <a href='https://github.com/yudai/cf_nise_installer'>means</a>. But BOSH makes it easier to manage an existing Cloud Foundry deployment, so we'll stick with that.
</p>
<p>
(A word of warning: the official documentation for "<a href='http://docs.cloudfoundry.com/docs/running/deploying-cf/ec2/index.html'>Deploying Cloud Foundry on AWS</a>" currently describes a installation process involving the <samp>bootstrap-cf-plugin</samp> gem. I recommend avoiding that path, as it conflates installation of BOSH and Cloud Foundry in a confusing way. Let's stay focused on BOSH, for now.)
</p>
<p>
BOSH is itself a distributed system, and can be deployed across multiple nodes (e.g. EC2 instances) for better performance and resilience. For our purposes, though, it's sufficient to have all the BOSH components installed on a single node. Such a single-node deployment is called a "MicroBOSH".
</p>
<p>
The BOSH project distributes MicroBOSH as stemcells; where "stemcell" is their term for a raw machine image. They provide an AWS-specific stemcells, which can be used to build AMIs, as well as stemcells for other target infrastructures (like vSphere and OpenStack).
</p>
<p>
Before you get too excited, though, you'll need an "inception server". Despite the fancy name, this is nothing more than an EC2 instance in your target AWS region, that you can SSH into, to use as a staging point for MicroBOSH installation. It's required because:
</p>
<ul class="sparse">
<li>
you'll be downloading some large BLOBs, and it's best to keep them inside the AWS network
</li>
<li>
in order to turn the MicroBOSH stemcell into an AMI, you need to create a mount an EBS volume.
</li>
</ul>
<p>
Any Linux instance running Ruby 1.9.x should do. If you don't have one handy, you can use one of:
</p>
<ul class="sparse">
<li>
Dr Nic's <a href='https://github.com/cloudfoundry-community/inception-server'><samp>inception-server</samp></a> project
</li>
<li>
my <a href='https://github.com/mdub/bosh-inception-vagrant'><samp>bosh-inception-vagrant</samp></a>
</li>
</ul>
<p>
Once you have an inception server running, SSH on in.
</p>
<h3>
Option A: <samp>bosh-bootstrap</samp>
</h3>
<p>
At this point, you're welcome to take a shortcut by way of Dr Nic's <a href='https://github.com/cloudfoundry-community/bosh-bootstrap'><samp>bosh-bootstrap</samp></a> project, which mostly automates the remainder of the process.
</p>
<h3>
Option B: DIY
</h3>
<p>
After some initial experiments with <samp>bosh-bootstrap</samp>, I wanted to understand more about what it was doing, so ended up building my MicroBOSH manually, using <samp>bosh-bootstrap</samp> as a guideline. If you're a sucker for punishment, like me, then read on ...
</p>
<p>
You'll need the BOSH command-line toolset. Create a <samp>Gemfile</samp> containing:
</p>
<pre>
<code class="ruby">source "https://rubygems.org"
source 'https://s3.amazonaws.com/bosh-jenkins-gems/'
gem "bosh_cli", "~> 1.5.0.pre"
gem "bosh_cli_plugin_micro", "~> 1.5.0.pre"
</code></pre>
<p>
and run "<samp>bundle install</samp>" to install the bits you need.
</p>
<p>
Now, use the AWS console (or the API) to set some stuff up:
<ul class="sparse">
<li>
create a keypair (e.g. "<samp>mybosh</samp>")
</li>
<li>
create a security group (e.g. "<samp>bosh</samp>") that allows inbound connections on the following ports:
</li>
<ul class="sparse">
<li>
<samp>22</samp> (for SSH)
</li>
<li>
<samp>4222</samp> (for the "nats" pub/sub protocol)
</li>
<li>
<samp>6868</samp> (for the BOSH agent)
</li>
<li>
<samp>25250</samp> (for the BOSH blobstore)
</li>
<li>
<samp>25555</samp> (for the BOSH director)
</li>
<li>
<samp>25777</samp> (for the BOSH registry)
</li>
<li>
<samp>53</samp> (UDP, for DNS)
</li>
</ul>
<li>
allocate an Elastic IP address (e.g. A.B.C.D)
</li>
</ul>
</p>
<p>
Pick a name for your MicroBOSH, e.g. "<samp>mybosh</samp>", and create a configuration file for it:
<pre>
$ mkdir -p ~/microbosh/deployments/mybosh
$ vi ~/microbosh/deployments/mybosh/micro_bosh.yml
</pre>
</p>
<p>
Here's an example <samp>micro_bosh.yml</samp> file:
<pre>
name: mybosh
logging:
level: DEBUG
network:
type: dynamic
vip: A.B.C.D
resources:
persistent_disk: 4096
cloud_properties:
instance_type: m1.medium
cloud:
plugin: aws
properties:
aws:
access_key_id: YOURKEY
secret_access_key: YOURSECRET
region: ap-southeast-2
ec2_endpoint: ec2.ap-southeast-2.amazonaws.com
default_security_groups:
- bosh
default_key_name: mybosh
ec2_private_key: /home/ubuntu/.ssh/mybosh.pem
apply_spec:
agent:
blobstore:
address: A.B.C.D
nats:
address: A.B.C.D
properties:
aws_registry:
address: A.B.C.D
</pre>
</p>
<p>
You'll need to :
<ul class="sparse">
<li>
insert the appropriate AWS access/secret key
</li>
<li>
change the region (unless "<samp>ap-southeast-2</samp>" is what you want)
</li>
<li>
replace "A.B.C.D" with the Elastic IP address you allocated earlier
</li>
</ul>
</p>
<p>
Now, you need a MicroBOSH image to install, so download the latest MicroBOSH stemcell:
</p>
<pre>
$ cd ~/microbosh
$ curl -O http://bosh-jenkins-artifacts.s3.amazonaws.com/micro-bosh-stemcell/aws/latest-micro-bosh-stemcell-aws.tgz
</pre>
<p>
Now it's time to crank up the MicroBOSH!
</p>
<p>
Be patient; it takes a while. Actually, you should consider installing a terminal multiplexer like <samp>tmux</samp> on your inception server, and running this step within a <samp>tmux</samp> session, just in case you get disconnected while the deployment is in progress.
<pre>
$ cd ~/microbosh/deployments
$ bosh micro deployment mybosh/
Deployment set to '/home/ubuntu/microbosh/deployments/mybosh/micro_bosh.yml'
$ bosh -n micro deploy ../latest-micro-bosh-stemcell-aws.tgz
Verifying stemcell...
File exists and readable OK
Using cached manifest...
Stemcell properties OK
Stemcell info
-------------
Name: micro-bosh-stemcell
Version: 776
Deploy Micro BOSH
unpacking stemcell (00:00:16)
uploading stemcell (00:10:48)
creating VM from ami-c51380ff (00:00:32)
waiting for the agent (01:01:20)
create disk (00:04:16)
mount disk (00:00:06)
stopping agent services (00:00:01)
applying micro BOSH spec (00:00:19)
starting agent services (00:00:00)
waiting for the director (00:00:18)
Done 11/11 01:18:06
WARNING! Your target has been changed to `https://55.251.169.14:25555'!
Deployment set to '/home/ubuntu/microbosh/deployments/mybosh/micro_bosh.yml'
Deployed `mybosh/micro_bosh.yml' to `https://mybosh:25555', took 01:18:06 to complete
</pre>
</p>
<p>
Note the id of the AMI produced; you can use this for future MicroBOSH deployments in the same region, bypassing the stemcell download and conversion processes, e.g.
<pre>
$ bosh -n micro deploy ami-c51380ff
</pre>
</p>
<p>
Actually, if you're really lucky, Pivotal might have already baked a MicroBOSH AMI in your target region, in which case you can just use that, and save yourself a lot of time (and network traffic).
<pre>
$ AWS_REGION=us-east-1
$ curl http://bosh-jenkins-artifacts.s3.amazonaws.com/last_successful_micro-bosh-stemcell-aws_ami_$AWS_REGION
ami-427b092b
</pre>
</p>
<p>
At time of writing, they're only baking AMIs for <samp>us-east-1</samp>, so for other regions you'll have to resort to downloading the stemcell, as described above.
</p>
<p>
Once your MicroBOSH is running, you should be able to connect to it:
<pre>
$ bosh target https://55.251.169.14:25555
Target set to `mybosh'
Your username: admin
Enter password: *****
Logged in as `admin'
$ bosh status
Config
/home/ubuntu/.bosh_config
Director
Name mybosh
URL https://55.251.169.14:25555
Version 1.5.0.pre.776 (release:6191c586 bosh:6191c586)
User admin
UUID eac3cf02-845d-4817-aa55-7626a071304a
CPI aws
dns enabled (domain_name: microbosh)
compiled_package_cache disabled
snapshots disabled
Deployment
not set
</pre>
</p>
<p>
Stay tuned for the next exciting episode!
</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/DevOps/CloudFoundry/DatabaseMigrationsOnCloudFoundryRunning database migrations on Cloud Foundry2013-06-19T21:55:00+10:002013-06-19T21:55:00+10:00<p>
An important part of deploying a database-backed application is keeping the database schema up-to-date. For Rails applications, you typically do that by running:
<pre>
$ rake db:migrate
</pre>
</p>
<p>
For Rails applications running on Heroku, you typically run the migrations immediately after deploying a new version of your app, e.g.
<pre>
$ git push heroku master
$ heroku run rake db:migrate
</pre>
</p>
<p>
Since Cloud Foundry takes after Heroku in so many ways, I expected to use a similar workflow when deploying my Rails application on Cloud Foundry. I was suprised to discover that CF does not support any equivalent of "<samp>heroku run</samp>". That is, there's not (yet) a built-in way to run a Rake task or shell-command in the context of the currently deployed application.
</p>
<p>
The <a href='http://docs.cloudfoundry.com/docs/using/deploying-apps/ruby/'>Cloud Foundry documentation</a> suggests a somewhat surprising solution: alter the startup command for your application to execute migrations before (re)-starting the web-app. I duly did so by specifying a custom start command in the Cloud Foundry "<samp>manifest.yml</samp>" for my app:
<pre>
---
applications:
- name: barfly
command: "bundle exec rake db:migrate && bundle exec rackup -p $PORT"
</pre>
</p>
<p>
We don't really want to run the migrations on every app-server instance, though. Luckily, Cloud Foundry provides meta-data to each instance, in the form of an environment variable, <samp>$VCAP_APPLICATION</samp>. Specifically, it provides an "<samp>instance_id</samp>" key, which contains the a unique number for every instance.
</p>
<p>
So, I created a Rake task to limit an action to the first instance, ie. the one with <code>instance_id == 0</code>.
<pre>
<code class="ruby">namespace :cf do
desc "Only run on the primary Cloud Foundry instance"
task :on_primary_instance do
instance_index = JSON.parse(ENV["VCAP_APPLICATION"])["instance_index"] rescue nil
exit(0) unless instance_index == 0
end
end
</code></pre>
</p>
<p>
If the <samp>instance_id</samp> is non-zero, or unset, the task exits Rake early, skipping any subsequent tasks. With this in place, I altered the startup command to make use of the new task:
<pre>
---
applications:
- name: barfly
command: "bundle exec rake cf:on_primary_instance db:migrate && bundle exec rackup -p $PORT"
</pre>
</p>
<p>
This actually works fairly well; the migrations run, then the application-server starts up. Of course, if the migrations fail, the app-server won't start. That's fine with me, for now, and I imagine it would be acceptable for many apps.
</p>
<p>
Others, though, might want/need tighter control over when database migrations run, rather than just running them automatically on boot. With that in mind, an alternative approach would be to handle migrations entirely separately from application deployments. Cloud Foundry provides easy access to information about database (and other external) services, e.g.
<pre>
$ cf file barfly logs/env.log | grep DATABASE_URL
DATABASE_URL=postgresql://deadb33f:5Dvc0ePHMrwFUuODGQiSYWCYHU-nIzu-@babar.elephantsql.com:5432/deadb33f
</pre>
</p>
<p>
Using those connection details, the deployment script could connect to and migrate the target database schema, prior to the "<samp>cf push</samp>" that updates the app. In many ways, this is preferrable to the push-migrate workflow typically used with Heroku, as it creates the flexibility to run constructive migrations independent of application deployments.
</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/DevOps/CloudFoundry/PlayingWithCloudFoundryV2Play-ing with Cloud Foundry v22013-06-17T21:50:00+10:002013-06-17T21:50:00+10:00<p>
I'm currently messing about a bit with <a href='http://www.cloudfoundry.com'>Cloud Foundry</a>, an open-source Platform-as-a-Service which promises to be something akin to Heroku-in-your-data-center, or perhaps Heroku-in-your-AWS-VPC. In any case, it's exciting stuff, and appears to be moving fast, under the guidance of <a href='http://gopivotal.com'>Pivotal</a>.
</p>
<p>
Brian McClain posted an <a href='http://catdevrandom.me/blog/2013/05/16/buildpacks-in-cloud-foundry-v2/'>article</a> recently about getting a Haskell app running in Cloud Foundry. I decided to do something similar using <a href='http://www.playframework.com'>Play</a>.
</p>
<p>
I hadn't used Play before, but it turns out that it's pretty easy to get a basic app up and running:
<pre>
<no-highlight>$ play new playpen
</no-highlight></pre>
</p>
<p>
A quick check locally, to check that everything is hanging together:
<pre>
<no-highlight>$ cd playpen
$ play start
$ open http://localhost:9000
</no-highlight></pre>
</p>
<p>
All good, so I'll try pushing it up to Cloud Foundry.
<pre>
<no-highlight>$ cf push
Name> playpen
Instances> 1
1: 64M
2: 128M
3: 256M
4: 512M
Memory Limit> 256M
Creating playpen... OK
1: playpen
2: none
Subdomain> playpen
1: cfapps.io
2: none
Domain> cfapps.io
Binding playpen.cfapps.io to playpen... OK
Create services for application?> n
Bind other services to application?> n
Save configuration?> y
Saving to manifest.yml... OK
Uploading playpen... OK
Starting playpen... OK
-----> Downloaded app package (1020K)
Installing java.
Downloading JDK...
Copying openjdk-1.7.0_21.tar.gz from the buildpack cache ...
Unpacking JDK to .jdk
/var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:63:in `start_command': Please specify a web start command in your manifest.yml or Procfile (RuntimeError)
from (erb):6:in `generate_startup_script'
from /usr/lib/ruby/1.9.1/erb.rb:838:in `eval'
from /usr/lib/ruby/1.9.1/erb.rb:838:in `result'
from /var/vcap/packages/dea_next/buildpacks/lib/staging_plugin.rb:110:in `generate_startup_script'
from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:84:in `startup_script'
from /var/vcap/packages/dea_next/buildpacks/lib/staging_plugin.rb:139:in `block in create_startup_script'
from /var/vcap/packages/dea_next/buildpacks/lib/staging_plugin.rb:138:in `open'
from /var/vcap/packages/dea_next/buildpacks/lib/staging_plugin.rb:138:in `create_startup_script'
from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:19:in `block in stage_application'
from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:12:in `chdir'
from /var/vcap/packages/dea_next/buildpacks/lib/buildpack.rb:12:in `stage_application'
from /var/vcap/packages/dea_next/buildpacks/bin/run:10:in `<main>'
Checking playpen...
Application failed to stage
</no-highlight></pre>
</p>
<p>
Well, that started well, but finished badly. Cloud Foundry is assumed I was pushing a generic Java app. We need to tell it how to handle Play applications.
</p>
<p>
Buildpacks to the rescue! Buildpacks are a concept that Cloud Foundry has borrowed <a href='https://devcenter.heroku.com/articles/buildpacks'>from Heroku</a> - they adapt the generic PaaS to the specfics of a particular application framework and/or language. A buildpack takes your application source-code as input, and outputs a compiled package that can be run on the target PaaS. Some buildpacks are very specific to the app framework; others are more generic, and can support multiple frameworks.
</p>
<p>
It turns out that the buildpack required for a Play 2.1.1 app is <a href='https://github.com/heroku/heroku-buildpack-scala'>heroku-buildpack-scala</a>. This was written for Heroku, but can be used without changes on Cloud Foundry!
</p>
<pre>
<no-highlight>$ cf push --buildpack https://github.com/heroku/heroku-buildpack-scala.git
Using manifest file manifest.yml
Not applying manifest changes without --reset
See `cf diff` for more details.
Uploading playpen... OK
Changes:
buildpack: '' -> 'https://github.com/heroku/heroku-buildpack-scala.git'
Updating playpen... OK
Stopping playpen... OK
Starting playpen... OK
-----> Downloaded app package (1020K)
Initialized empty Git repository in /tmp/buildpacks/heroku-buildpack-scala.git/.git/
Installing heroku-buildpack-scala.git.
-----> Installing OpenJDK 1.6...done
-----> Building app with sbt
-----> Running: sbt clean compile stage
Getting net.java.dev.jna jna 3.2.3 ...
Getting org.scala-sbt sbt 0.12.2 ...
(... etc etc ... download half the internet ...)
[success] Total time: 3 s, completed Jun 17, 2013 5:39:55 AM
-----> Dropping ivy cache from the slug
-----> Uploading staged droplet (127M)
-----> Uploaded droplet
Checking playpen...
Staging in progress...
0/1 instances: 1 starting
1/1 instances: 1 running
OK
</no-highlight></pre>
<p>
Success! That's one more toy application in the cloud!
</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Programming/Ruby/AutotestHooksFun with Autotest hooks2013-06-05T10:40:00+10:002013-06-05T10:40:00+10:00<p>I've been doing more tight-feedback-loop unit-testing in Ruby recently, and find myself using <code>autotest</code> (from the <a href="https://github.com/seattlerb/zentest">ZenTest</a> suite) a lot.</p>
<p>One of the great things about <code>autotest</code> is it's hook mechanism, which allows you to hang behaviour on the passing or failing of a test suite. Many people have written hooks that publish test results via a notification mechanism such as Growl. I wrote a similar one that signals the state of the test-suite via the title of the terminal window running <code>autotest</code>:</p>
<pre><code>def set_title(title)
if ENV["TERM"] =~ /^xterm/
puts "\e]0;#{title}\007"
end
end
Autotest.add_hook(:green) do
set_title "GREEN - all passed"
end
Autotest.add_hook(:red) do |autotest|
set_title "RED - #{autotest.files_to_test.size} failure"
end
</code></pre>
<figure>
<img src="AutotestHooks.png" />
</figure>
<p>Also, I have a tendency to checkpoint with "<code>git add</code>" whenever the tests pass. That's easily delegated to <code>autotest</code>:</p>
<pre><code>Autotest.add_hook(:green) do
checkpoint_command = "git add ."
puts "AUTOTEST_CHECKPOINT> #{checkpoint_command}"
system(checkpoint_command)
end
</code></pre>
<p>Does anyone have other useful hooks to share?</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Programming/Ruby/MultiThreadedProcessingWithLazyEnumerablesMulti-threaded processing with lazy enumerables2013-05-23T23:30:00+10:002013-05-23T23:30:00+10:00<p>A project I'm working on at the moment involves trawling a large collection (millions) of S3 objects, and I wanted to parallelize the processing across multiple threads.</p>
<p>As it happens, it's actually pretty easy to process a Ruby collection using multiple Threads:</p>
<pre><code class="ruby">inputs = [1, 2, 3, 4, 5]
threads = inputs.collect { |i| Thread.new { i * i } }
outputs = threads.collect { |thread| thread.join.value }
outputs #=> [1, 4, 9, 16, 25]
</code></pre>
<p>The problem with this naive approach, though, is that you end up creating a Thread for each element of the collection, all at once. For large collections, that's a bad idea. Typically you want to limit the number of Threads you having running simultaneously.</p>
<figure>
<img src="diagrams/eager.png" />
<figcaption>Figure 1: Thread explosion!</figcaption>
</figure>
<p>We can fix the do-everything-at-once problem using lazy enumeration:</p>
<pre><code class="ruby">require 'lazily' # or use Ruby 2
inputs = [1, 2, 3, 4, 5]
lazy_threads = inputs.lazily.collect { |i| Thread.new { i * i } } # lazy!
outputs = lazy_threads.collect { |thread| thread.join.value }
outputs.to_a #=> [1, 4, 9, 16, 25]
</code></pre>
<p>Okay, but now we have the opposite problem: the worker Threads aren't created until immediately before their outputs are required, so we don't get any parallelization.</p>
<figure>
<img src="diagrams/lazy.png" />
<figcaption>Figure 2: Not really parallel</figcaption>
</figure>
<p>So, what if we reintroduced <em>just a little</em> eagerness? Let's <strong>prefetch</strong> some of the Threads in the <code>lazy_threads</code> collection, before we actually need them:</p>
<pre><code class="ruby">require 'lazily'
inputs = [1, 2, 3, 4, 5]
lazy_threads = inputs.lazily.collect { |i| Thread.new { i * i } }
prefetched_threads = lazy_threads.prefetch(2) # <- added magic
outputs = prefetched_threads.collect { |thread| thread.join.value }
outputs.to_a #=> [1, 4, 9, 16, 25]
</code></pre>
<p>The implementation of <code>#prefetch</code> is pretty straightforward; it creates a lazy buffer in front of another lazy enumerable, which keeps itself full, as it feeds elements to it's consumer.</p>
<p>Bingo! Now we're starting a limited number of Threads in advance of needing their outputs.</p>
<figure>
<img src="diagrams/anticipation.png" />
<figcaption>Figure 3: Antici........pation</figcaption>
</figure>
<p>We can size the prefetch "window" to get some parallelization, without creating an explosion of Threads. Even better, we've managed to do it without involving tricksy multi-threading operators like <code>Mutex</code> or <code>Queue</code>. And, the collection of outputs is lazy, so we can use this approach to process large (even infinite?) collections.</p>
<p>This "sliding window" approach to multi-threading is implemented in my new gem, <a href="https://github.com/mdub/lazily">Lazily</a>, as <code>#in_threads</code>:</p>
<pre><code class="ruby">require 'lazily'
inputs = [1, 2, 3, 4, 5]
outputs = inputs.lazily.in_threads(4) { |i| i * i }
outputs.to_a #=> [1, 4, 9, 16, 25]
</code></pre>
<p>(Lazily is an implementation of Ruby2-like lazy Enumerables in pure Ruby. Give it a spin if you're eager to be lazy, and can't deploy to ruby-2.0.0).</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Agile/ThreeRsOfTestAutomationThe Three R's of Test Automation2011-07-11T22:00:00+10:002011-07-11T22:00:00+10:00<p>To properly assess a test's value, you need to consider both the benefit it provides, <u>and</u> the cost of creating, maintaining, and executing it. When the cost outweighs the benefit, the test has ceased to provide (net) value.</p>
<p>A have a few simple criteria I use to gauge the value provided by an automated test. Even better, I've managed to rationalise them all starting with the same letter, so now proclaim:</p>
<blockquote>
<big><strong>The Three R's of Test Automation:</strong></big>
<p>
An automated test must be <strong>Rapid</strong>, <strong>Reliable</strong>,
and <strong>Relevant</strong>.
</p>
</blockquote>
<h2>Rapid </h2>
<p>All other things being equal, quick tests are better than slow tests. Quick tests add more value, because they provide feedback earlier, and can be executed more often.</p>
<p>Quick tests require fewer resources to execute, reducing cost. Often, the maintanance cost is also lower, as quick tests tend to be simpler and involve fewer moving parts.</p>
<h2>Reliable</h2>
<p>If a test fails, and that failure highlights a defect, which otherwise might have gone undetected for a while, then the test provided value. There's also sustantial value in the confidence gained when tests pass. </p>
<p>The benefit can quickly be eroded, though, if tests are not trustworthy. Where they are known to be deficient, then additional manual testing must be performed to gain confidence, which undermines the value of test automation.</p>
<p>Also, tests can turn into a major maintenance burden if they're overly specific, or too tied to implementation details, because they become fragile - breaking not because of an introduced defect, but because something unimportant changed.</p>
<h2>Relevant</h2>
<p>Automated tests are not <em>inherently</em> valuable; they are not an end unto themselves. So, what makes them valuable? The obvious answer is that they provide useful feedback ... but, what makes that feedback useful? I think feedback is only useful when it <em>mitigates risk</em>. </p>
<p>If a test fails, and that failure highlights a defect (which otherwise might have gone undetected for a while), then the test provided value. There's also sustantial value in the confidence gained when tests pass.</p>
<p>Automated tests are less relevant when they test functionality which:</p>
<ul>
<li>is less valuable (e.g. the features that nobody uses)</li>
<li>is less likely to fail (e.g. stable)</li>
<li>is already well-covered by other tests</li>
</ul>
<h2>Summary</h2>
<p>When considering whether to create an automated test, or retain an existing one, consider both:</p>
<ul>
<li>How valuable is the feedback it provides?</li>
<li>How much will it cost you to maintain and execute it?</li>
</ul>
http://www.dogbiscuit.org/mdub/weblog/Tech/Agile/FlippingTheCardWallFlipping the card wall2011-02-05T23:30:00+11:002011-02-05T23:30:00+11:00<p>Most agile software development projects have a "card wall", with each card representing a story, or a task, or some other unit of work. </p>
<p>Typically, the wall is arranged in columns with labels such as: <em>In Analysis</em>, <em>Ready for Dev</em>, <em>Developing</em>, <em>Ready for Test</em>, <em>Testing</em>, and eventually, <em>Done</em>. And typically, cards progress across the wall from left to right. The labels vary, a lot, but the left-to-right thing is fairly standard.</p>
<p>The thing is, I've noticed a tendency for people to concentrate on the leftmost stuff first. That is, your focus drifts towards work that is in-progress, or not yet begun, partly because its the first thing to catch your eye. I argue that this increases the risk of work over in the right-most "almost done" columns languishing, incomplete and undelivered, while the team moves on to other things.</p>
<p>So, on my current project, I convinced the team to try something different. Observe ...</p>
<p align="center">
<img src="fedex-card-wall.png" alt="flipped card wall" title="our card wall, after flipping" width="495" height="178"/>
</p>
<p>The key difference here is that cards progress from <em>right-to-left</em>, rather than the other way around. Work that is almost finished is on the left, while stuff we haven't started yet is way over on the right. As I suspected, this orientation encourages people (including me) to think about the things that are almost-but-not-quite done, first. My hope is that this will help us <em>pull</em> (as opposed to <em>push</em>) work through our process, and reduce "inventory" at each step.</p>
<p>At the same time, the labels on our columns are all verbs, making it very clear what needs to happen with that card next. These, then, are our columns:</p>
<ul>
<li><p><strong>Advertise</strong>: If we've recently released useful new features, our quickest path to providing value is to get people using them. We hold a "showcase" every two weeks or so, but often use other channels (email, internal blog, wandering over) to promote our wares, as well.</p></li>
<li><p><strong>Release</strong> (and Document): Much of our work needs to be made available explicitly, by upgrading a network service, or publishing a new version of a software component. We try to do this as soon as possible once we consider it "done". Part of releasing is making sure that suitable documentation is available, if necessary.</p></li>
<li><p><strong>Review</strong>: If someone has recently completed a task, we want someone else to review it be we consider it "good to go". Sometimes another member of the team can do it, but usually we'll ask an end-user to give it the seal of approval (or not, as the case may be). Getting this feedback is more important than building new stuff.</p></li>
<li><p><strong>Implement</strong>: Assuming we've cleared out the previous couple of columns, we can do some actual programming (with frequent pauses to drink coffee and/or insult each other's choice of text editor). We don't use separate columns to distinguish what's being worked on, and what's as-yet unstarted; instead, the presence of one or more people's names on a card indicates that it's getting attention.</p></li>
<li><p><strong>Discuss</strong>: Sometimes a good whiteboard session is required before we can feel comfortable that a card is ready to be worked on. Often these discussions result in cards being split into smaller, more concrete steps.</p></li>
<li><p><strong>Consider</strong>: On the right hand side is a cloud of things that are really good ideas. We keep them around to remind ourselves, and others, that while these are really good ideas, we're NOT working on them right now. We don't bother attempting to keep these in priority order, as priorities are likely to change radically before we get to them; instead, we'll do just-in-time prioritisation when space becomes free further to the left of the board. </p></li>
</ul>
<p>We keep our current "milestone" (medium-term goal) in plain sight, on a card just above our column headings. This helps us maintain focus, by highlighting when we are tempted to work on things that are "off-mission".</p>
<p>I think this has been a positive change, and has <em>helped</em> us reduce work-in-progress. It's no silver bullet, though. For instance, we still tend to spend too much time (IMHO) thinking about potential upcoming work, as evidenced by the cluster of cards in "Discuss", in the photo above. Still, early days.</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Projects/ClampClamp - a Ruby command-line framework2010-12-06T21:30:00+11:002010-12-06T21:30:00+11:00<p>In the course of my current project, I've been writing a bunch of command-line utilities. While they're just Ruby scripts, much of their work is interacting with the user: accepting command-line options and arguments, and providing useful feedback in case of errors. So, I wrote a little framework to make it easier. It's called <a href="http://github.com/mdub/clamp">Clamp</a>.</p>
<p>Clamp models a command as a Ruby class, and a command execution as an instance of that class. Command classes look like this:</p>
<pre><code>class SpeakCommand < Clamp::Command
option "--loud", :flag, "say it loud"
option ["-n", "--iterations"], "N", "say it N times", :default => 1 do |s|
Integer(s)
end
parameter "WORDS ...", "the thing to say", :attribute_name => :words
def execute
the_truth = words.join(" ")
the_truth.upcase! if loud?
iterations.times do
puts the_truth
end
end
end
</code></pre>
<p>At execution time, Clamp uses the "option" and "parameter" declarations to map command-line arguments onto the command object as instance variables.</p>
<p>There are numerous Ruby libraries out there to help with parsing of command-line options - even a couple built into the standard library - so why write something new? Partly, it's because most of the alternatives only address option parsing. I wanted to focus less on parsing options, and more on modelling the command itself. Clamp is similar in some ways to <a href="https://github.com/wycats/thor">Thor</a>, the command framework behind Bundler and Rails3 generators, though Thor models commands as methods, rather than classes. It's also quite a bit bigger.</p>
<p>Anyway, next time you're writing a command-line utility in Ruby, I hope you give Clamp a go, and that it makes your job a little bit easier.</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Programming/Ruby/GemRequire"gem require"2010-12-06T09:00:00+11:002010-12-06T09:00:00+11:00<p>One thing that irks me about Rubygems is that it provides no cheap way to ensure that a gem is installed. For instance, in a script, I want to check that I've got "heroku" installed, before calling it. My options include:</p>
<ul>
<li>just assume it's installed, and fail if the assumption is bad;</li>
<li>call "<code>gem install</code>", knowing that it will needlessly re-install the gem if it's already installed;</li>
<li>try calling "<code>heroku</code>", and fall back to "<code>gem install</code>" if it fails.</li>
</ul>
<p>I'm not happy with any of those options, so a wrote a simple gem command plugin to make it easier. Install it like so:</p>
<pre><code>$ gem install gem_require
Successfully installed gem_require-0.0.3
1 gem installed
</code></pre>
<p>Now you can use "<code>gem require</code>" in place of "<code>gem install</code>". It's similar, except that it short-circuits if you already have the required gem installed:</p>
<pre><code>$ gem require heroku
heroku (1.11.0) is already installed
</code></pre>
<p>If you want to ensure that you're on the bleeding edge, use the "<code>--latest</code>" option:</p>
<pre><code>$ gem require --latest heroku
Installing heroku (1.14.6) ...
Installed heroku-1.14.6
</code></pre>
<p>Of course, if you already <strong>have</strong> the latest version, there's no need to re-install it:</p>
<pre><code>$ gem require --latest heroku
heroku (1.14.6) is already installed
</code></pre>
<p>I hope that helps somebody else.</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Projects/OneProjectThreeGemsOne project, three gems2010-09-14T22:00:00+10:002010-09-14T22:00:00+10:00<p>A couple of weeks ago I gave a presentation at the Melbourne Ruby user-group, talking about three Ruby gems that emerged from my recent work at Lonely Planet.</p>
<ul>
<li><p><a href="http://github.com/mdub/arboreal">arboreal</a> is an ActiveRecord extension to support navigation of tree-shaped data.</p></li>
<li><p><a href="http://github.com/mdub/representative">Representative</a> makes it easy to create XML or JSON representations of Ruby objects.</p></li>
<li><p><a href="http://github.com/mdub/sham_rack">ShamRack</a> I've <a href="ShamRack">blogged about</a>, already.</p></li>
</ul>
<p>I'll write more about the first two in weeks to come. Meanwhile, here are the slides:</p>
<div style="width:425px" id="__ss_5063927"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/mdubya/one-project-3-gems" title="One project, 3 gems">One project, 3 gems</a></strong><object id="__sse5063927" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=1-project-3-gems-100826170251-phpapp02&stripped_title=one-project-3-gems" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse5063927" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=1-project-3-gems-100826170251-phpapp02&stripped_title=one-project-3-gems" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object></div>
http://www.dogbiscuit.org/mdub/weblog/Tech/Agile/AllAboardTheReleaseBusAll aboard the "Release Bus"2010-09-12T22:00:00+10:002010-09-12T22:00:00+10:00<p><em>
Spoiler: A "Release Bus" groups story cards to create a tangible representation of what's going into a release.
</em></p>
<p>These days, it's pretty common for software projects to have a "story wall", or "task board", which story cards track across as they progress from "written" to "done". </p>
<p>In a recent project, we were doing exactly that, but were having problems with the "done" cards, which ended up swimming around together in a column labelled "Signed Off".
It was difficult to see which of the signed off cards would be included in next release.
It was unclear when cards should be removed from the wall.
And, some cards didn't even involve software changes, e.g. where we were "spiking", doing some analysis of production data, or tweaking the production infrastructure.</p>
<p><img src="release-bus.png" alt="example Release Bus" title="a fully-loaded Release Bus, ready to leave the station" align="right" width="225" height="300" style="margin-left: 1em"/></p>
<p>Solution: We created a "Release Bus" — an A3 sheet of paper, onto which releasable cards are placed once they're signed off. This simple sheet of paper suddenly made the release such more tangible. In essence, it's the release-level equivalent of a story card.</p>
<p>It also provides a handy place to track other aspects of the release, like the release number, and a candidate build number. We typically have a "release bitch" [sic], responsible for tracking the release into production, and the bus was a handy place to capture that, too. </p>
<p>Best of all: as the release moves off through the final stages of pre-release testing, and into production, we have a single artifact that we can move off into an archive, rather than dealing with individual story cards.</p>
<p>In the picture above, you'll also notice a "pedestrian" area; we use this for those operational or investigative cards that didn't result in releasable software.</p>
<p>All in all, a useful wee innovation.</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Agile/CarefulSoftwareDevelopmentCareful Software Development2009-09-03T22:00:00+10:002009-09-03T22:00:00+10:00<p>
Enough with this seat-of-your pants "eXtreme Programming", this twisty-turny "Agile" bollocks, and this anorexic "Lean"!
You're all a bunch of cowboys!
</p>
<p>
The time has come for a return to a more sensible, conservative approach. Just be a little more careful, okay?
</p>
<h4>
Careful, don't waste your money (and time)
</h4>
<p>
It's sensible to ensure that what you're building has <u>real business value</u>.
Ideally, you should be building the highest-value stuff first.
Anything less, and you're being careless with your (or someone else's) money.
</p>
<h4>
Careful, make sure it's useful
</h4>
<p>
It's worth saying again ...
</p>
<p>
If you build stuff that users don't want, nobody's going to end up happy.
I suggest you keep in close contact with them.
Show them what you're building for them as soon as possible (yes, even before it's "finished"),
so they can help keep you pointed in the right direction.
</p>
<p>
It's important to listen to your users, but don't forget that they (like you) only have part of the picture.
You need to work <u>with</u> them to find the best way to solve their problems,
rather than just implementing their proposed solutions without question.
</p>
<p>
Even when <u>you</u> eventually understand what the requirements are, the rest of the team might not.
So you'd best write the requirements down somewhere. Somewhere safe.
Code (in the form of automated tests) is an excellent way to express requirements, because (unlike other options) it's hard to misinterpret.
</p>
<h4>
Careful, don't miss the turn-off
</h4>
<p>
Plans are great; make sure you have one. But don't follow it blindly.
Once you get going, you'll probably discover lots of things you didn't know when you made the plan.
You may even stumble upon some useful short-cuts: ways to deliver more value, for less cost!
Don't let the plan prevent you from taking those opportunities.
</p>
<p>
After all, even if you have a map, it's still useful to look at the road ahead.
Where'd you get that map, anyway? Oh, you drew it yourself?!
Well, why not slip that back into it's protective plastic cover, so it doesn't get grubby. There ya go.
And keep an eye out for road signs, okay? Excellent.
</p>
<h4>
Careful, free stuff isn't always cheap
</h4>
<p>
You may find something that looks like it might solve all (or some) of your problems.
Perhaps some free software, or a product you already have licenses for.
That would be nice.
</p>
<p>
But beware! Adapting your problem to someone else's solution is hard,
and if you fail, leaves you in a delicate situation.
In some situations, rolling your own solution works out cheaper in the long run.
</p>
<h4>
Careful, don't bite off more than you can chew
</h4>
<p>
Q. How do you eat an elephant? A. One bite at a time.
</p>
<p>
For god's sake, don't give yourself indigestion by tackling the whole trunk at once!
I had a cousin who did that, and she still can't ride a bike properly.
</p>
<p>
Slow and steady wins the race.
</p>
<h4>
Careful, don't break the stuff that's already working
</h4>
<p>
Software's not easy, you know. It gets complicated inside that big box. And hot.
You might accidently trip over some important wires while you're in there installing the new
fnord-wangling module.
</p>
<p>
You know those automated tests, ummm I mean requirements, you captured earlier? Just run them again.
Actually, run them as frequently as you can; that should alert you to any inadvertent mistakes as soon as they occur.
</p>
<h4>
Careful, don't let it fail in production
</h4>
<p>
Oh boy, if it goes wrong in production, your users are going to be mighty grumpy.
That's embarassing.
And these things tend to happen whenever it's least convenient.
You might just have to cancel that ski-trip.
</p>
<p>
Careful, too, to have good processes for finding and fixing production issues if and when they do occur.
Make sure you can turn fixes around quickly. You may make the ski-fields yet!
</p>
<h4>
Careful, don't step on your own toes
</h4>
<p>
Make sure that the team is all working towards the same goal; all pulling in the same direction.
Yeah, unfortunately that means talking to each other. All the time.
But otherwise, you'll just get in each other's way.
</p>
<h4>
Careful, don't be left holding the (ugly, vomiting) baby
</h4>
<p>
There's a chance (actually, let's call it a certainty), that the rest of the project team will desert you.
The clever ones are likely to leave first.
</p>
<p>
So, I suggest you ensure you have a decent understanding of the system before they go.
Documentation might help. But probably not: most of it tends to be kind of useless.
You're probably better off working with them a bit while they're still around.
That way you can ask them questions.
</p>
<p>
And if you work together, you'll likely produce a more maintainable result, anyway.
</p>
<h4>
Okay, then
</h4>
<p>
In the immortal words of Sergeant Phil Esterhaus:
<blockquote>
"Let's be careful out there."
</blockquote>
</p>
<p>
Thanks for listening.
</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Projects/ShamRackIntroducing ShamRack2009-07-03T15:50:00+10:002009-07-03T15:50:00+10:00<p>
The system I'm currently working on integrates with several external systems, over HTTP, using simple (RESTish) web-services. I really don't want to involve those external systems while testing my own, though; I want to stub 'em out.
</p>
<p>
My first attempt involved stubbing out HTTP calls using my mocking framework of choice. I'm using <a href='http://rest-client.heroku.com'>RestClient</a>, which I like a lot, and stubbing out RestClient API calls worked quite well. It kept on working quite well for several hours, until I decided to refactor a little, using RestClient in a slightly differently way, at which point it broke completely. Bother. I really don't like having tests coupled to implementation details, so went searching for another way.
</p>
<p>
<a href='http://fakeweb.rubyforge.org'>FakeWeb</a> looked pretty good, in that it stubs things out at the Net::HTTP layer, which I'm unlikely to refactor out of the picture. In the end, though, it's not really what I wanted. I wanted to be able to do things like:
</p>
<ul>
<li>
verify the body (and mime-type) of a POST/PUT request
</li>
<li>
dynamically generate responses, based on some aspect of the request (e.g. query parameters)
</li>
</ul>
<p>
In short, I wanted a <a href='http://xunitpatterns.com/Fake%20Object.html'>Fake Object</a>, rather than a simple stub.
</p>
<p>
It occurred to me around about then that we already have plenty of tools for describing the behaviour of web-applications: they're called web-application frameworks! Many of them are too heavy-weight for my purposes, but <a href='http://www.sinatrarb.com/'>Sinatra</a> is nicely minimal. So, 60 lines of Ruby code later, I had a little web-app that mimicked one of those external web-services sufficiently for my testing. Win!
</p>
<p>
But waitaminut. I really don't want to have to start a separate process running my fake web-service, and talk to it using HTTP. That's going to be slow: network I/O isn't cheap. Isn't there some way I can use something like Sinatra but still keep everything in-process?
</p>
<p>
There is now. <a href='http://github.com/mdub/sham_rack'>ShamRack</a> plumbs Net::HTTP directly into applications built to run on <a href='http://rack.rubyforge.org/'>Rack</a>. Which includes all Sinatra apps, as well as Rails, Merb, etc.
</p>
<p>
<div align="center">
<img src="/mdub/images/ShamRack.png" />
</div>
</p>
<p>
Using ShamRack, I avoid the network traffic, making the tests a whole lot faster (about 25 times faster, in my case). Plus, I avoid the complication of having to start and stop an external web-server. Finally, because my fake web-service app is in-process, I get a handy back-channel I can use to setup or inspect it's state during tests.
</p>
<p>
If you find ShamRack handy, or have ideas about how it could improve, let me know!
</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Programming/Ruby/SpyingOnYourCodeWithRRSpying on your code with RR2009-05-27T22:39:00+10:002009-05-27T22:39:00+10:00<p>
A while back, Melbourne's own Pete Yandell created <a href='http://notahat.com/not_a_mock/'>Not A Mock</a>, an extension to RSpec that supports <a href='http://xunitpatterns.com/Test%20Spy.html'><em>test-spies</em></a>. And a damn fine idea it was, too.
</p>
<p>
I've recently discovered that my current favourite stub/mock framework, Brian Takita's <a href='http://github.com/btakita/rr'>RR</a>, can do test-spies too!
</p>
<h3>
Huh?
</h3>
<p>
What's this "spy" business about? Well, when <em>mocking</em>, <u>before</u> triggering the behaviour you're testing, you set up <em>expectations</em> that a certain methods of collaborating objects will be invoked, with the specified parameters. Like so:
</p>
<pre>
describe TransferEverything do
before do
@account1 = Account.new
@account2 = Account.new
@transfer = TransferEverything.new(:from => @account1, :to => @account2)
end
describe "#execute" do
it "moves all funds from one account to the other" do
all_the_money = 1.42
stub(@account1).balance { all_the_money }
mock(@account1).withdraw(all_the_money) # <= set expectations
mock(@account2).deposit(all_the_money)
@transfer.execute # <= execute
end # <= verify expectations
end
end
</pre>
<p>
The expectations are typically verified auto-magically, by the mocking framework, at the end of your test.
</p>
<h3>
The spy alternative
</h3>
<p>
Setting up expectations <em>before</em> a call always feels clumsy. Using a test <em>spy</em> makes tests flow more naturally:
</p>
<ol>
<li>
<strong>Stub</strong> out collaborators, setting up canned responses where required.
</li>
<li>
<strong>Execute</strong> the code you're testing.
</li>
<li>
<strong>Verify</strong> the results, including both:
</li>
<ul>
<li>
the outputs (return values, or resulting state)
</li>
<li>
the interactions (ie. the method-invocations you expected your fake collaborators to receive).
</li>
</ul>
</ol>
<p>
Fur egg-sample:
</p>
<pre>
describe TransferEverything do
# ...
describe "#execute" do
it "moves all funds from one account to the other" do
all_the_money = 1.42
stub(@account1).balance { all_the_money }
stub(@account1).withdraw
stub(@account2).deposit
@transfer.execute
@account1.should have_received.withdraw(all_the_money)
@account2.should have_received.deposit(all_the_money)
end
end
end
</pre>
<p>
One thing I find particularly useful about this technique is the ability to execute code in a setup block, then verify the various aspects of it's behaviour in separate test-cases.
</p>
<pre>
describe TransferEverything do
# ...
describe "#execute" do
before do
@all_the_money = 1.42
stub(@account1).balance { all_the_money }
stub(@account1).withdraw
stub(@account2).deposit
@transfer.execute
end
it "withdraws all funds from source account" do
@account1.should have_received.withdraw(all_the_money)
end
it "deposits funds in receiving account" do
@account2.should have_received.deposit(all_the_money)
end
end
end
</pre>
<p>
This results in smaller, more coherent test-cases.
</p>
<h3>
Using RR test-spies in RSpec
</h3>
<p>
If you're using RSpec, you'll need to use the adapter class that comes with RR, rather than the one that comes with RSpec. That is, in your <samp>spec_helper.rb</samp>, do this, which provides access to the <samp>have_received</samp> matcher.
</p>
<pre>
require 'rr'
Spec::Runners.configure do |config|
config.mock_with RR::Adapters::Rspec
end
</pre>
<h3>
Spying on Java
</h3>
<p>
Honourable mention: if you're lucky (*cough*) enough to be coding Java, I HIGHLY recommended <a href='http://mockito.org'>Mockito</a>, which also implements test-spies, and is easily the best Java mocking/stubbing library around.
</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Mac/Rsync1TimeMachine0Rsync: 1, Time Machine: 02008-11-11T23:30:00+11:002008-11-11T23:30:00+11:00<p>
I recently bought a Mac Mini to serve various purposes about the house - not least of which, as a
remote backup server for my MacBook Pro.
</p>
<p>
At which point I spent several evenings wrestling with <a href='http://www.apple.com/macosx/features/timemachine.html'>Time
Machine</a>, with limited success. I moved my
existing (500G, external) drive to the Mac Mini, shared it, and nominated it as my backup volume.
But:
</p>
<ul class="sparse">
<li>
Time Machine wouldn't recognise the existing backups on that drive, and insisted on starting again
from scratch (because it creates sparsebundle disk images for remote backup clients, but not for
the local system). Annoying.
</li>
<li>
The initial backup took <strong>forever</strong>, because TM backs up <strong>everything</strong> not specifically excluded.
(Granted, I'm backing up over a 801.11g wireless network).
</li>
<li>
Incremental backups kicked in every hour, and even when I hadn't been altering files, seemed to
take an excessive amount of time to complete, ie. around 15 minutes. Much of this time was spent
"preparing", and affected the performance of both my laptop, and the network. I don't need or want
hourly backup, but TM provides no way to set a less demanding schedule.
</li>
<li>
Several times things got borked when I interrupted a backup midway, and I had to reboot, remount or
otherwise intervene to get things working again.
</li>
</ul>
<p>
Eventually, I gave up, and went looking for alternatives. After flirting with
<a href='http://www.nongnu.org/rdiff-backup'>rdiff-backup</a> and <a href='http://rsnapshot.org'>rsnapshot</a>, I eventually
did a <a href='http://blog.interlinked.org/tutorials/rsync_time_machine.html'>little</a>
<a href='http://www.mikerubel.org/computers/rsync_snapshots'>research</a> and rolled my own rsync backup script:
</p>
<pre>
#! /bin/sh
set -e
snapshot_host=theLoungeRoomMac.local
snapshot_dir=/Volumes/WD_500/Snapshots/woollyams
snapshot_user=root
ssh_user=$snapshot_user@$snapshot_host
ping -o $snapshot_host > /dev/null || {
echo "WARNING: can't see $snapshot_host -- skipping backup"
exit 1
}
ssh $ssh_user "test -d $snapshot_dir" || {
echo "ERROR: can't see $ssh_user:$snapshot_dir" >&2
exit 2
}
snapshot_id=`date +%Y%m%d%H%M`
/usr/bin/rsync --archive --verbose \
--delete --delete-excluded \
--numeric-ids --extended-attributes \
--one-file-system \
--partial \
--link-dest ../current/ \
--relative \
--max-size=50M \
--exclude ".git" \
--exclude ".svn" \
/private/etc /Users/mdub \
$ssh_user:$snapshot_dir/in-progress/
ssh $ssh_user "cd $snapshot_dir; rm -fr $snapshot_id; mv in-progress $snapshot_id; rm -f current; ln -s $snapshot_id $snapshot_dir/current"
</pre>
<p>
Advantages over Time Machine are:
</p>
<ul class="sparse">
<li>
I can run this as often or as infrequently as I like.
</li>
<ul>
<li>
I'm currently running it out of /etc/daily.local, which is run by
<a href='http://developer.apple.com/documentation/Darwin/Reference/ManPages/man8/periodic.8.html'>periodic</a>, which is run by
<a href='http://developer.apple.com/documentation/Darwin/Reference/ManPages/man8/launchd.8.html'>launchd</a>.
</li>
<li>
It doesn't get in my way by running while I'm actively using my machine.
</li>
</ul>
<li>
I can use the full power of rsync filter rules to exclude uninteresting files (e.g. "<code>--exclude .git
--exclude .svn</code>").
</li>
<li>
I can even filter by file size ("<code>--max-size=50M</code>") to skip things like big downloads and VMware
images, without having to explicitly nominate them.
</li>
<li>
It takes less than 3 minutes to perform an incremental backup (providing I haven't changed too much).
</li>
<li>
I can safely interrupt the backup process, or pull the plug, or whatever, and it's robust enough to carry
on where it left off next time.
</li>
<li>
I can keep as many time-stamped snapshots as I wish.
</li>
<li>
It's relatively efficient space-wise, due to the use of hard-links to share unchanged files between
snapshots (not as efficient as Time Machine, though, which hard-links entire directories).
</li>
<li>
Each snapshot is a simple, easy-to-browse, easy-to-search directory, containing plain old files and
directories. It gives me comfort that I wouldn't need a spiffy GUI to locate a file I was looking to
restore.
</li>
</ul>
http://www.dogbiscuit.org/mdub/weblog/Tech/Agile/ThePointOfContinuousIntegration"Continuous Integration" might not mean what you think it means2008-07-08T23:25:00+10:002008-07-08T23:25:00+10:00<p>
<a href='http://martinfowler.com/articles/continuousIntegration.html'>Continuous
Integration</a> is a
common practice in <a href='http://agilemanifesto.org/'>Agile</a> development circles, but I
think people (especially those new to agile thinking) sometimes miss the point.
</p>
<p>
Problem is, the term has become synonymous with build-servers such as
<a href='http://cruisecontrol.sourceforge.net/'>CruiseControl</a>
(<a href='http://tinderbox.mozilla.org/'>etc</a>, <a href='https://hudson.dev.java.net/'>etc</a>), which
frequently grab the latest code, build it, and execute automated tests. These are
often referred to as "continuous-integration servers", which IMHO is a <u>really
bad name</u>, 'cos if there's one thing these servers typically <em>don't</em> do,
it's <u>integrate</u>.
</p>
<p>
And the point of continuous-integration is just that: <strong>Integrating</strong>. <strong>Continuously</strong>! Which means:
</p>
<ul class="sparse">
<li>
developers frequently updating their working-areas (or personal branches) with
the latest code on the mainline branch (typically many times a day), and
</li>
<li>
frequently merging their own changes back into the mainline (typically several
times a day).
</li>
</ul>
<p>
Unless you're doing this, you ain't "doing continuous integration", however frequently you're running automated builds!
</p>
<p>
Integrating continuously can be difficult. In particular, it forces you to chunk
larger changes and features into small, bite-sized pieces that can be drip-fed
into the codebase. And, you have to deal with other developers changing stuff all
the time. Build-servers and automated tests are essential tools here, because they
help keep the team honest, ensuring that everyone has a stable (if evolving) base
to work on.
</p>
<p>
There are are plenty of upsides to frequent integration:
</p>
<ul>
<li>
each individual integration is smaller, and therefore easier
</li>
<li>
design issues (including differences of opinion) are identified earlier
</li>
<li>
developers can leverage each other's work earlier
</li>
<li>
changes can be tested (and bugs detected) earlier
</li>
<li>
software can be deployed more frequently
</li>
</ul>
<p>
In summary: check it in already!
</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Programming/AttackingSlowRunningBuildsAtCitconAttacking slow-running builds (notes from CITCON)2008-07-01T22:20:00+10:002008-07-01T22:20:00+10:00<p>
Last weekend I went along to <a href='http://www.citconf.com/'>CITCON</a> here in Melbourne. Which was great fun, by the way.
</p>
<p>
There I ran a session on "Attacking slow-running CI builds". It was a small group, but an interesting discussion, I think. Here are my (rough, unedited) notes:
</p>
<h3>
WHAT is the impact of a slow build?
</h3>
<ul>
<li>
fewer checkins
</li>
<li>
more waiting
</li>
<li>
context switching
</li>
<li>
discourages integration
</li>
<li>
discourages writing of additional tests
</li>
<li>
more chance of overlapping checkins
</li>
<li>
more build breakages
</li>
<li>
more time required to get the build fixed
</li>
<li>
reduced productivity
</li>
<li>
WASTE!
</li>
</ul>
<h3>
WHY is the build slow?
</h3>
<ul>
<li>
slow tests (particularly acceptance tests)
</li>
<ul>
<li>
over-testing (testing the same code-paths repeatedly)
</li>
<li>
expensive set-up and tear-down
</li>
<li>
too much testing via the user-interface
</li>
<li>
tests that pause, sleep, or poll (e.g. to deal with AJAX)
</li>
</ul>
<li>
too much I/O!
</li>
<li>
use of slow infrastructure components (database servers, application servers, etc.)
</li>
<li>
slow hardware
</li>
</ul>
<h3>
HOW can we make it faster?
</h3>
<ul>
<li>
faster hardware
</li>
<li>
run tests in parallel
</li>
<li>
distribute tests
</li>
<li>
fail fast
</li>
<ul>
<li>
selective testing: run tests most likely to fail first
</li>
<ul>
<li>
could use dependency-analysis to identify which tests were affected by recent commits
</li>
</ul>
</ul>
<li>
refactor story-based acceptance tests into scenario-based tests
</li>
<ul>
<li>
bigger tests, with more assertions, offsets set-up/tear-down costs
</li>
<ul>
<li>
but makes tests more complex
</li>
</ul>
</ul>
<li>
share test fixtures between a group of tests
</li>
<ul>
<li>
but breaks test isolation
</li>
</ul>
<li>
avoid I/O
</li>
<ul>
<li>
in-memory database
</li>
<li>
in-memory file-store (RAM disk?)
</li>
<li>
stub out infrastructure components
</li>
<ul>
<li>
avoid testing these components by side-effect
</li>
</ul>
</ul>
<li>
populate the database directly, rather than using the user-interface to set-up for a test
</li>
<li>
separate your system into components that can be tested independently
</li>
</ul>
<h2>
Thinking about this later ...
</h2>
<h3>
There are two types ...
</h3>
<p>
The suggestions for improving build times seemed to fall into two categories:
<ol>
<li>
optimise the build/tests
</li>
<li>
throw additional hardware at the problem
</li>
</ol>
</p>
<p>
My problem with the "throw hardware at it" approach is that it typically <u>only helps for the build-server</u> machine; the poor old developers are still left with a slow-running build, and therefore many of the productivity issues still exist.
</p>
<h3>
Another idea
</h3>
<p>
It occurs to me now that we missed a fairly fundamental trick to improve test times: <u>improve the performance of the system-under-test itself</u>. It's a great excuse to start thinking about performance earlier in the project.
</p>
<h3>
"Customer Acceptance Test" does not need to mean end-to-end
</h3>
<p>
On all the projects I've been on in recent years, we've ended up with the majority of the tests being either "developer unit tests", which run super-fast, or "customer acceptance tests" which test end-to-end (browser-to-database) and run super-slow.
</p>
<p>
Methinks it should be less black-and-white. If we can demonstrate functionality that the customer cares about by calling the underlying logic directly (i.e. at unit-test level), rather than by exercising the user-interface, then what's wrong with that? (We just need one test to prove that the underlying logic has been properly integrated into the UI.)
</p>
http://www.dogbiscuit.org/mdub/weblog/Tech/Railsconf2008HighlightsRailsconf 2008 Highlights2008-06-05T22:00:00+10:002008-06-05T22:00:00+10:00<p>
I was lucky enough to be at Railsconf 2008 in Portland last weekend (along with
<a href='http://www.martyandrews.net/blog/'>Marty</a>, Rob, <a href='http://www.prozacblues.com/travo/blog/'>Trav</a> and
<a href='http://blog.hiremaga.com/'>Abhi</a>).
</p>
<h3>
Highlights
</h3>
<ul class="sparse">
<li>
Meeting other Ruby/Rails enthusiasts from all over. (Well, all over the US, at least).
</li>
<li>
<strong>Joel Spolsky</strong>'s opening keynote was hilarious (in a good way). Some other commentators found it
low on content, but I thought it had a strong message: usability matters!
</li>
<li>
Seeing <strong>Kent Beck</strong> present was fantastic. He had the audience hanging on his every word, as he
described how "anything he'd done had taken 20 years to have an impact".
</li>
<li>
Ezra's talk on <a href='http://brainspl.at/articles/2008/06/02/introducing-vertebra'><strong>Vertebra</strong></a>, his
XMPP-based "cloud control" project, was fascinating. What a great abuse of technology!
</li>
<li>
The <strong>JRuby</strong> and <strong>Rubinius</strong> teams are co-operating closely, in a spirit of friendly, respectful
rivalry. Particularly notable is their effort to collaborate (with each other, and Matz) on a
rigourous set of executable specs for Ruby language.
</li>
<li>
The upcoming version of <a href='http://modrails.com'><strong>Phusion Passenger</strong></a> will support not only Rails
applications, but also <a href='http://rack.rubyforge.org/'>Rack</a> (and therefore Merb, Sinatra, Camping),
and (get this) <a href='http://en.wikipedia.org/wiki/Web_Server_Gateway_Interface'>WSGI</a> (and therefore a
bunch of Python frameworks, including <a href='http://www.djangoproject.com/'>Django</a>)!
</li>
<li>
There are <strong>increasingly varied options for deploying Rails apps</strong>, including the traditional
<code>{Apache,nginx}+{mongrel,thin}</code>, JRuby WARs in a servlet container, Passenger, and
the Amazon-EC2-based services like RightScale and Heroku. <a href='http://heroku.com'><strong>Heroku</strong>'s</a>
deployment model is pretty damn clever: just "<code>git push</code>".
</li>
</ul>
<h3>
Regrets
</h3>
<p>
With 4 streams going on, the talks I got to were naturally out-numbered by those I missed. Some of the ones I really wish I'd seen include:
</p>
<ul>
<li>
MagLev: Gemstone's Ruby implementation-in-progress, based on their Smalltalk VM
</li>
<li>
Scott Chacon on "Using Git" (apparently he went into mind-bending detail of the Git internals)
</li>
<li>
Justin Gehtland's "Small Things, Loosely Joined, and Written Fast"
</li>
</ul>
http://www.dogbiscuit.org/mdub/weblog/Tech/Mac/GitOnTheMacGit (on the Mac)2008-04-18T13:30:00+10:002008-04-18T13:30:00+10:00<p>
<a href='http://git.or.cz/'>Git</a> is the hype. I'm just starting to use it for a couple of projects, both directly, and as a local facade to Subversion.
</p>
<p>
Here are some suggestions on using git under Mac OS X.
</p>
<h3>
Installation
</h3>
<p>
Installation using MacPorts is pretty painless. Ensure you choose the "svn" variant if you want Git/Subversion integration.
</p>
<pre>
sudo port install git +svn +doc
</pre>
<p>
Another option is the native installer, available at <a href='http://code.google.com/p/git-osx-installer/'>http://code.google.com/p/git-osx-installer/</a>
</p>
<h3>
Textmate
</h3>
<p>
If you use Textmate, the <a href='http://gitorious.org/projects/git-tmbundle/'>Git Textmate bundle</a> is <strong>rather nice</strong>.
</p>
<pre>
cd ~/Library/Application\ Support/TextMate/Bundles
git clone git://gitorious.org/git-tmbundle/mainline.git Git.tmbundle
</pre>
<p>
Remember to set the TM_GIT variable (to "/opt/local/bin/git" or "/usr/local/bin/git", as the case may be), otherwise stuff won't work.
</p>
<h3>
Shell completion
</h3>
<p>
For command-line (bash) users, there's TAB-completion available, which is pretty handy. I'm using it directly from my local clone of the git
source tree, like this:
</p>
<pre>
# in .bashrc ...
git_completion_script=$HOME/OpenSource/kernel.org/git/contrib/completion/git-completion.bash
if test -f $git_completion_script; then
source $git_completion_script
fi
</pre>
<h3>
GitNub for history browsing
</h3>
<p>
<a href='http://github.com/Caged/gitnub/wikis/home'>GitNub</a> is a sweet little UI for browsing history of git commits.
</p>
<h3>
Using Git
</h3>
<p>
So far, I haven't talked at all about how you actually USE the thing, and don't intend to, since there are already so many great resources out
there on the subject. Some I've found useful are:
</p>
<ul class="sparse">
<li>
<a href='http://www.kernel.org/pub/software/scm/git/docs/tutorial.html'>A tutorial introduction to git</a>
</li>
<li>
<a href='http://git.or.cz/course/svn.html'>Git for SVN users</a>
</li>
<li>
Git cheat-sheets from <a href='http://zrusin.blogspot.com/2007/09/git-cheat-sheet.html'>Zack Rusin</a> and <a href='http://cheat.errtheblog.com/s/git/'>Err the blog</a>
</li>
<li>
Andy Delcambre's <a href='http://andy.delcambre.com/2008/3/4/git-svn-workflow'>Git SVN Workflow</a>
</li>
</ul>
http://www.dogbiscuit.org/mdub/weblog/Tech/Programming/Ruby/Rails/ReadOnlyFormBuilderReadOnlyFormBuilder2008-03-08T22:00:00+11:002008-03-08T22:00:00+11:00<p>
For RubyOnRails developers, <a href='http://api.rubyonrails.org/classes/ActionView/Helpers/FormHelper.html'><samp>form_for</samp> and <samp>fields_for</samp></a>
are the accepted way of DRYing up form templates. You know the deal; you code
</p>
<pre>
<% form_for :customer, :url => customers_path() do |customer_form| %>
<p>
<label>Name:</label>
<%= customer_form.text_field :first_name, :size => 15 %>
<%= customer_form.text_field :last_name, :size => 20 %>
</p>
... etc ...
<% end %>
</pre>
<p>
and you get
</p>
<pre>
<form action="/customers" method="post">
<p>
<label>Name:</label>
<input id="customer_first_name" name="customer[first_name]" size="15" type="text" />
<input id="customer_last_name" name="customer[last_name]" size="20" type="text" value="" />
</p>
... etc ...
</form>
</pre>
<p>
Rails generates sensible field names and ids for you, and slurps existing values out of the model object. So far, so good.
</p>
<p>
Lately, I've taken to using the same trick when presenting data, not just when editing it. So, whereas before I might have written:
</p>
<pre>
<p>
<label>Name:</label>
<span id="customer_first_name"><%= h @customer.first_name %></span>
<span id="customer_last_name"><%= h @customer.last_name %></span>
</p>
... etc ...
</pre>
<p>
I'll now code it up as:
</p>
<pre>
<% fields_for :customer, :builder => ReadOnlyFormBuilder do |customer_form| %>
<p>
<label>Name:</label>
<%= customer_form.text_field :first_name, :size => 15 %>
<%= customer_form.text_field :last_name, :size => 20 %>
</p>
... etc ...
<% end %>
</pre>
<p>
and get the same output. (In case you're wondering, the ids are there to help with automated testing).
</p>
<p>
Note the similarity between the last code snippet and the first one on this page; apart from the first line they're
indentical. Usually, I'll put the field-declarations themselves in a partial that's shared between "new", "edit" and "show"
actions. That way, your "show" page automatically gets identical layout to the others, just with raw values in place of editable fields.
</p>
<p>
The ReadOnlyFormBuilder class itself it fairly straightforward - I'm planning to wrap it up into a plugin sometime soon. In the meantime, the implementation of text_field looks something like this:
</p>
<pre>
def text_field(attribute, options={})
content_tag("span", html_escape(value_of(attribute)), :id => "#{@object_name}_#{attribute}")
end
def value_of(attribute)
value = model.send(attribute)
end
def model
@object || @template.instance_variable_get("@#{@object_name}")
end
</pre>