In this post, part of our testing blog series, I’ll talk a bit about some things you might consider testing, and—probably even more importantly—some things you might not want to test. This is all the more relevant if you’re managing a smaller list (say, fewer than 100k active members). As we’ll discuss in future posts, it takes huge sample sizes to reliably detect relatively small differences in two testing segments, so you’ll want to reserve your testing for factors that are likely to have larger effects on your goals, like subject lines, for example.
But to begin, we should be on the same page regarding why we test. It’s pretty simple. We tend to be pretty bad at guessing what will happen, so it’s often better to let data inform our decision making. For instance, when sending an email, should you go with a negative subject line like “This Republican is the worst!”, or a positive one like “Sally Jane is a great Democrat!”?
This trivial example allows us to demonstrate an important testing concept. Testing is only a tool; it’s not the final judge, nor does it say anything about the appropriateness of your content. If “This Republican is the worst!” isn’t in congruence with your campaign/organization’s messaging and mission, then you shouldn’t test that subject line, let alone use it for an email to your entire list.
So, then, assuming the subject matter is in-line with your messaging and mission, what’s something you should test, even with a small list? Subject lines could be one, but there are other things that could have a big impact on your action rates. What comes to mind first and foremost is email content.
By this I mean writing two completely different emails, whether they’re about the same thing or completely different concepts. The varied factor could be anything from your topic and theory of change to your tone and word choice. Even ostensibly similar emails—let alone drastically different ones—can yield very large differences in results. We at ActBlue, for example, regularly test at least three different fundraising emails for every one that we end up sending to our full list.
For one of our most recent email blasts, we sent four different email drafts, a couple of which were quite similar. The results? The best-performing draft brought in over triple the number of donations as the worst-performing drafts! So, here’s a clear case in which performing a simple test can lead to much higher action rates, whether you’re looking for signatures on a petition or donations to your cause.
It might seem that writing three or more email drafts for every send is a bit much for a resource-constrained organizer. If that’s the case, you should still be message testing periodically, say, once a month or so. The goal here is to ascertain the biggest button-pushers for your list members. A standard example is testing the performance of an email highlighting the negative characteristics of your opponent against the performance of an email highlighting the endorsement of your candidate by a local community leader. This is a less resource-intensive way to gauge the temperature of your list and see what resonates with your list members.
So if the content of an email is something that is definitely worth testing, what are some things that small campaigns shouldn’t test? Well, anything that you expect won’t result in a large percentage difference between your test segments. For example, you certainly could test four differently colored donate buttons, but you shouldn’t.
Chances are, you won’t see a significant advantage in one of them over the others. How do I know this? Well, I can’t claim 100% certainty (nor can any honest analyst), but whenever we at ActBlue or some of our larger partners have tested something very small like this, we’ve seen that result.
For example, we wanted to run an A/B test1 on our contribution form to find out whether we could increase the conversion rate by removing the header, “Employment Information” above two of the FEC-required fields. To see what that looked like (and for some more A/B test examples), check out this blog post. We knew that it would take us close to 150,000 page views to reliably detect the small percentage difference in the two segments of the test we required to make a permanent change to our contribution form. I’ll talk more about determining required sample size in a later post, but for now, the point is that it took a lot to get a little.2 If you manage a smaller list, that means sending dozens of emails for a relatively minor gain, and that’s not worth your time.
Of course, context matters a lot, and in this case, context is your email program and your members. So, the final word is that if you really, really want to know, you should indeed test something for yourself instead of taking someone else’s word for it. But you’re much better off focusing on testing more meaningful factors (like your messaging) that are likely to result in clear and large differences. For the small things, you can learn from the organizations that have the resources to test small nuances. If you subscribe to numerous email lists, you’ll get a good gauge of what community best practices are at a given time.
Testing one email draft against another tells you exactly one thing: which (if either) is better. It doesn’t, however, tell you some things that can be quite valuable: Do members of your list tend to prefer positive emails or scare-to-action emails? Do they tend to respond well to fun, edgy language or slightly more formal language?
One A/B test won’t provide much of an answer to questions like that, but repeatedly testing two different email styles—like short, punchy emails against longer, more descriptive emails—over time can help you understand the style of communication your list members prefer, and therefore help you write emails with better action rates.
As you go on and develop your testing program, examining other questions like how much money to ask for in a fundraising email, how to best segment your list, and so on becomes more important and makes more sense from a cost-benefit perspective, too.
But to start, remember: make sure what you’re testing fits in with your organization’s messaging, plan a test that has a plausible chance of realizing big gains, and, more than anything else, work on honing your messaging. You’ll need to start out with bigger questions—and, therefore, more general tests—about your list members and eventually narrow down to the specifics.
The next post in our series about testing will talk about some essential factors involved in setting up a test, like setting up your groups and determining your required sample size. Expect that one to be published next week, after Netroots Nation.
Footnotes:
1 “A/B test” is an informal term for statistically testing two variations of some singular factor against each other in order to determine which, if either, is better for your desired outcome.
2We have millions of people land on our contribution forms each month, so for us, there’s a huge payoff to testing minor details that result in small percentage-point gains. It’s thousands of tests like this one over the years that make our contribution forms so successful. But this is our context— running a testing program with a small list is a totally different game.