In science we study physically meaningful quantities that have some kind of objective reality, and that means that multiple people should draw substantively equivalent conclusions. But in some situations, this principle is at odds with the Bayesian Coherency Principle, and so we have to choose between internal consistency, or consistency with external reality.
The simplest kind of A/B test compares two options, using a single KPI to decide which option is best. The more general theory of statistical experiment design easily handles more options and more metrics, provided we know how to incorporate the *multiple comparisons* involved. To see why this is important, read on!
We have previously mentioned the Stable Unit Treatment Value Assumption, or SUTVA, a complicated-sounding term that is one of the most important assumptions underlying A/B testing (and Causal Inference in general).
In our last post, we introduced the potential outcomes framework as the foundational framework for causal inference. In the potential outcomes framework, each unit (e.g. each person) is represented by a pair of outcomes, corresponding to the result of the experience provided to them (treatment or control, A or B, etc.
"Why can't I take the results of an A/B test at face value? Who are you, the statistics mafia? I don't need a PhD in statistics to know that one number is greater than another." If this sounds familiar, it is helpful to remember that we do an A/B test to learn about different potential outcomes. Comparing potential outcomes is essential for smart decision making, and this framework is the cornerstone of causal inference.
When I started this blog, my primary objective was less about teaching others A/B testing and more about clarifying my own thoughts on A/B testing. I had been running A/B tests for about a year, and I was starting to feel uncomfortable with some of the standard methodologies.