A/B Testing: A Love Story

In the fast-paced digital campaigns world, if you’re not innovating and testing constantly, you’re headed for obsolescence. And, more importantly, you’re letting your users down, especially those in short-term competitive environments (aka elections). At ActBlue, we’re always developing our platform with metrics-driven decision making, aka testing.

The result is that today’s ActBlue isn’t the same as the ActBlue of a month ago, and that’s a great thing. Sometimes our tests fail. Others result in a barely statistically significant bump in conversion rates. But that’s ok because all of those little bumps add up. Occasionally we hit on a big winner that dramatically increases conversion rates. We do it in a methodical, constant way that allows us to identify improvements big and small.

One advantage we have is the sheer volume of contributions we process, which allows us to A/B test small tweaks to the form and get statistically sound results. If one organization tried running an identical test on their own, they’d never be able to identify as many improvements.

We’ve got thousands of campaigns and organizations counting on us to have the best system possible, so they can focus on winning. It drives our work and testing every single day.

Our tech team make changes to the platform daily. Some are minor tweaks, others major changes. They’ve developed a rock-solid platform where we can easily roll out significant feature or a layout change, even in the middle of the crazy busy end-of-quarter period. And that’s no easy feat, but a deliberate design choice so we can be as nimble as the party needs.

Today we thought we’d roll back the curtain just a little bit and break down some of our favorite A/B tests from the past few months.

Test 1: Employer Address Checkbox

We know from our data that a lot of donors mark retired or unemployed on the forms and we wanted to see if we could use that knowledge to increase conversions. Turns out: yes! We A/B tested our normal form with one that has a checkbox they can click if they’re not employed. The checkbox automatically provides us with the information, which fulfills the legal requirement and bumps up conversion rates.

Original:

Checkbox:

We saw a 4.7% improvement in conversions (p < 0.05, for those of you keeping score), so we switched over to the new checkbox version. Bonus points for cutting waaaaay down on customer service questions about the occupation/employer boxes.

Test 2: Shrinking the Contribution Form

Speed is essential in online contributions, so we’re always looking for ways to make the Contribution Form shorter and faster to load, but the rapid increase in mobile donations has made it even more important than ever. We ran a number of tests aimed at shrinking the contribution form, including the following:

- Removed credit card tooltip (which popped up when you click the credit card box) so it would load better on mobile
- Removed “Employment” section header
- Using horizontal employer fields rather than stacking them vertically

All of these tests ended without statistically significant results, but that was a win for us, because it meant we could make our forms less cluttered. If a feature isn’t adding value, that means it’s time to go. And bye bye those three things went on every single form in our system.

You can see the evolution of the Employment section below.

Version 1 (original):

Version 2 (horizontal):

Version 3 (no header, checkbox added):

Test 3: Multi-step Contribution Forms

We already wrote a whole blog post about this test, but it’s worth mentioning again here. This was one of those big wins, with a 25.86% increase in conversion rates with 99% significance. That was after just a few days of running the test. We had tested multi-step Contribution Forms a few years back, and they lost to our standard one page forms, which just goes to show how important it is to test and test again.

One page form (losing version):

Multi-step form (winning version):

We do one thing at ActBlue and we’re the best at it in the business. And the biggest reason is that we’re constantly upgrading our platform. We push changes out to everyone ASAP so that thousands of campaigns and groups big and small can get the best right away.

In a few months when we get down to the crunch of election time, know that we’ve got your backs and you will always be using the most optimized and tested form out there.

6 comments
    • Ross Martin said:

      Thanks for the question, Ryan! While I did not read his entire white paper, Goodson appears to take issue not with A/B testing per se, but rather with poorly conducted A/B tests. Specifically, he mentions three common points of possible error: statistical power, what he calls ‘multiple testing’, and ‘regression to the mean’. In fact, these are all important concepts for implementing statistically sound tests. We at ActBlue take scientific rigor quite seriously, and our internal A/B testing procedures (whether it be performing power analysis, avoiding the ‘early peeking’ problem, & c.) certainly do avoid the pitfalls Goodson outlines in his paper. We hope others will trend toward better statistical methodology, and are proud to be setting a good example.

      • Ryan Aslett said:

        Thanks, thats the answer I was hoping for. A lot of folks wave around an A/B testing flag without necessarily doing it right. A/B testing, done properly, can reveal whether A or B was better. It isn’t able to help you select which A or which B you should test against each other. (Though in this case, single page vs multipage transactions forms, is definitely the right A and B). A/B testing is such a great tool for quantitatively improving everything that we’re doing on the web, but I worry that its just becoming the latest marketing buzzword for establishing brand credibility. Thanks for putting my worries to rest in this case.

    • Just to add a little to the discussion – to investigate if multiple testing is a problem, we need to consider the statistical significance threshold ActBlue is using and compare that to the number of different tests they are running. For the multi-page test they established a 99% significance level. Another way to state that is that 1 out of a 100 times they would get a false positive. So, roughly speaking, if they had run 100 variants of the multi-page test, you would expect 1 false positive.

      To be really concrete, you could imagine a scenario where you make 100 different multi-page versions of the page, you test each of these, all of them fail to have a statistical significant improvement except for one, you then pick that one as your new design, declare victory, and go home.

      I would guess that given the significant rework involved changing from single page to multi-page, they did not create 100 different multi-page versions and test each one individually :)

      Probably one of the best descriptions of multiple testing problems is in this XKCD comic:

      http://xkcd.com/882/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 44 other followers

%d bloggers like this: