Violations of the Stable Unit Treatment Value Assumption
We have previously mentioned the Stable Unit Treatment Value Assumption, or SUTVA, a complicated-sounding term that is one of the most important assumptions underlying A/B testing (and Causal Inference in general). In this post, we talk a little more about it and why it is so important.
In the potential outcomes framework underlying A/B testing, we assume the response of each unit can be summarized by a pair of numbers reflecting what will happen to that unit with and without treatment. But that assumes the response for that unit is independent of what happens to any other unit. Imagine an experiment conducted on human beings (perhaps testing some kind of vaccine). Imagine that many or all of these humans have partners, also part of the experiment. It is possible that the response for an individual depends not only on whether they themselves received treatment, but also on whether their partner received treatment.
In this case, there are actually four potential outcomes, corresponding to the case where:
- Individual receives treatment; partner receives treatment
- Individual receives treatment; partner does not receive treatment
- Individual does not receive treatment; partner receives treatment
- Individual does not receive treatment; partner does not receive treatment
If the response of an individual is affected by whether their partner receives treatment, then there are four potential outcomes, not two. Estimating all four potential outcomes requires that we consider the test subjects one couple at a time. We would need to make sure that for some couples, both individuals receive treatment (providing insight for the first potential outcome). We would need to make sure that for some couples, neither individual receives treatment (providing insight for the last potential outcome). And we would need to make sure that for some couples, one person receives treatment and the other does not. This is a little more complicated than the simpler case of no interaction between couples, but it’s not too bad conceptually.
But what does the causal effect even mean in this case? When there are just two potential outcomes (individual receives treatment or doesn’t receive treatment), the causal effect is just the difference in these outcomes. When there are four outcomes, it is not obvious how to combine them to calculate a causal effect, even if we knew what the potential outcomes were. One possibility is to average the two potential outcomes where the individual receives treatment, average the two potential outcomes where the individual does not receive treatment, and subtract the averages.
Another possibility is to ignore the potential outcomes where one individual receives treatment and their partner does not, taking the difference between the potential outcomes where both receive treatment and neither receive treatment. In this case, there is no point in having couples where one person receives treatment and the other doesn’t; it doesn’t tell us anything about the causal effect we are investigating. Instead, we can assign treatment to couples instead of individuals: some couples are randomly selected for treatment and some are not. The couple becomes the experimental unit rather than the individual.
Even if we think SUTVA is satisfied at the individual level, we can actually check for particular violations. We assign treatment to couples, giving a good mix of couples who both receive treatment, neither receive treatment, and receive mixed treatment. If SUTVA is violated, then we should see a dependency between an individual’s response and whether their partner received treatment.
Although interference between units makes the experiment design a little more complex, assuming we know who is partners with whom, we can take this into account with no real conceptual challenges. We can even handle instances where the responses of an individual depend on the treatment assignment of more than one other individual, again assuming we know the connections between individuals.
The real challenge is when we don’t know who is connected to whom. If we didn’t think to record this information, or if we didn’t know in the first place, we would have no way of accounting for interactions between individuals. I’ve never come across a work-around for this! Instead, researchers do their best to ensure that experimental units (individuals, couples, families, classrooms, etc.) are defined so that SUTVA is plausibly satisfied.
Humans being social creatures, SUTVA is an important assumption that should be defended based on domain knowledge, and checked whenever possible. You now know what it is, how to validate it, and what to do when it is violated!