There’s a great article on A/B testing in Wired today; if you haven’t yet read it, you might read it now and then come back. I feel like somehow, I keep finding myself in a contrarian position related to Things That Are Going To Change Business, and I don’t do it on purpose, honest. But I’m skeptical of A/B testing, just as I’m skeptical of most experiment-driven behavioral economics research, just as I’m skeptical of the use of surveys to prove anything. And in all three cases, the reasons are the same: behavior is complicated, the method is overly reductive, and the approach ignores the magic and the soul.
Behavior is complicated.
I consume every book on behavioral economics and decision making that I can get my hands on, and while I can’t claim to understand all of what I read, I can make a few generalizations.
First, we have two main systems of decision making, one that’s historically and impulsively driven by an urge to stay alive, and one that’s reflective and considered. They both operate, all of the time, and they often contradict each other. That means that, depending on the broader circumstances of use, the same person will respond differently to a stimulus, and so attempts to consider causality related to A/B testing need to correct for things like the ambient environment in which the user is using the system.
Additionally, discrete behavioral rules are compounded by the world around you. For example, there’s something called the mere exposure effect, where, as Daniel Kahneman explains, “repetition induces cognitive ease and a comforting feeling of familiarity.” Seeing a word, face, shape, or other design pattern over and over increases the likelihood that a person will view that word, face, shape or design pattern as “good.” You have control over your own web property, but none over the rest of the internet, and that’s where this exposure is going to happen. In other words, it’s likely that visual precedent set by other sites will change the way a user feels about your site. That’s just one of hundreds of discrete psychological effects that exist – discrete in how it was tested and observed, but when played out in real life, there’s nothing discrete about it.
Additionally, the way people act on the internet is highly irrational, and anyone who has ever observed a usability test realizes that many people seem to be in a state of chaos when using technology, clicking, quite literally, everywhere. A/B testing almost implicitly assumes a rational agent, one who is taking actions based on a logical assessment of what they see in front of them. My experience tells me that simply isn’t a good assumption, and so the results of your test are likely to be inconclusive (even when the data tells you otherwise).
The method is overly reductive, and we never learn why.
A scientific approach attempts to isolate one thing in order to predict causality. That’s the basis of A/B testing. The problem is, one thing isn’t being “isolated”: the human using the system. Statistical models can start to make predictive assumptions about the likelihood of the human using the system, fitting into various profile types, but it’s going to take someone a lot smarter than your average bear to produce these models. A well respected startup in Austin, Vast, employs David Franke, a brilliant mathematician, as Chief Scientist. A big company like Google has hundreds of people to do this work. But I’ve found it rare that the small companies most likely to engage in A/B think about this at all, much less employ someone with a background in statistics who is qualified to model it correctly.
There’s a great anecdote that I heard from Ron Kurti, also at Vast, and repeated at Luke Wroblewski’s site: putting forms in a mad libs style increases conversion by 25-40%. It’s safe to say that, immediately following this observation, mad libs style forms started appearing all over the internet (if you haven’t heard this yet, you are probably thinking the same thing: how can I change my site to have mad libs forms?). But we don’t know why this works, and because behavior is complicated, we have no way of creating generalized rules for where it works best. And yes, we can A/B test it on our own sites to know if it works for us, but again, we won’t learn why. I don’t want my products, systems, or services to be black boxes; I want to understand how they work, why they work, and I want to have some degree of control over the things I’m introducing into the world.
The approach abdicates responsibility.
The same problem I have with “Lean UX” is evident here: we’re throwing things out in the world without really thinking about the implications these have on real people. As Wired describes, “But with A/B testing, WePay didn’t have to make a decision. After all, if you can test everything, then simply choose all of the above and let the customers sort it out.” Your customers aren’t there to sort it out. They’re real people, with real emotions, and your test is having real implications on their real lives. This may not matter, depending on what it is your company does. It’s hard to argue that, on a site where people rate restaurants, it’s ethically irresponsible to change the color of buttons to determine which has a higher transaction rate. But I would make a much more adamant case that, in a system used on a daily basis by an at-risk population, your customers can’t be your guinea pigs.
The approach ignores the magic and the soul.
I understand the value of data and a rational approach to things like engineering. I would like someone who is designing an airplane to use a rational, data-driven, scientific, rigorous approach to understand how much weight that plane can hold. But in the same example, we find an obvious illustration of what happens when we only use an analytical approach. Flying sucks, and it sucks because it’s been engineered to death. Using Google is starting to be a lot like flying, probably because it’s being engineered to death. An emotional approach has value, because it provides things that are unexpected, sensual, poetic, and things that feel magical.
Good design crafts a story, and I can’t think of anything more powerful than a good story. Brian Christian wrote a great piece for Wired, and I’ll be damned if he A/B tested multiple versions of it to find the one with just the right level of engagement. I don’t want to live in a world where things are optimized, much less optimized for transactions and consumption. I want up and down, and high and low, and things that are absurd, and things that have personality, and things that react in unexpected ways.
Designers and A/B Testing
Why A/B testing of web design fails