This is a the first of a series of posts on Angus Deaton and Nancy Cartwright’s working paper, “Understanding and Misunderstanding Randomized Controlled Trials.”
Yesterday I was fortunate enough to hear Angus Deaton, 2015 Nobel Laureate in Economic Sciences, present a working paper (coauthored with Nancy Cartwright) that is essentially a comprehensive and at times technical critique of randomized controlled trials (RCTs, or experiments) and their use in policymaking. As an experimentalist, it was fascinating to hear an outsider critique the method with a high degree of sophistication, yet I was left with some lingering doubts about what we should do as scientists in response to Deaton and Cartwright’s critiques.
I will attempt, perhaps coarsely, to summarize their argument in brief. Experiments focus on the estimation of a single quantity of interest: the (sample) average treatment effect. This quantity may not be particular interesting (as there are other quantities we may care about, such as further moments of the effect distribution), it can be difficult to estimate precisely (for various reasons discussed in the paper), and the goal of randomization to balance covariates is by definition not achieved by randomization nor is it an approach that necessarily takes full advantage of relevant information. That, in essence, is the first half of the paper (and the only part I will focus on in this post).
To catch some highlights of the talk, I created a storify of my live tweeting of the event.
These arguments are reasonable enough. There is good reason to care about estimands other than the SATE (such as the PATE, various local ATEs, etc.). And an emergent literature (see, especially work by Don Green and Holger Kern) is attempting to show how to estimate heterogeneous treatment effects from experimental data. The questions of precision and appropriate estimation of p-values, too, have been the subject of longstanding debate (and the reason why randomization inference is used, for example, in Don Green and Alan Gerber’s textbook; see also research by Keele, McConnaughy, and White). Finally, the recognition that randomization balances covariates only in expectation is - hopefully - part of the first lesson of any experimental methods course. It’s certainly a point I try to reiterate any time I teach experimental design.
As such, Deaton and Cartwright’s critiques focus largely on failures of practice in RCTs rather than failures of current methodological understanding. And that leads me to three major points of contention with their argument.
First, their major critique of randomization focuses on why randomization is inferior to approaches that assign treatment status based upon covariate information known pretreatment. Such practices, known as “blocking” or “block randomization”, are common in cases where such covariate information is known in advance; their critique thus focuses on (the very large subset of) studies that rely on complete randomization. Yet their critique here is slightly off-base because the advantage of randomization is its balancing (in expectation) of unobserved covariates. Block randomization as a procedure is precisely intended to handle cases where observed covariates can be incorporated into an experimental design. Their answer - that randomization is not superior to any arbitrary assignment mechanism in handling of unobservables - is correct, but as an experimentalist I would argue that conclusion the opposite way: no arbitrary assignment mechanism is superior to randomization, so randomization should be the preferred way to assign treatment once all useful covariate information is accounted for in the design. This is especially true of an arbitrary assignment mechanism might accidentally confound treatment assignment with features of unit (e.g., assigning according to order of entry into the study, where order of entry is associated with potential outcomes). Furthermore, the abstract situation in which the treatment only takes two values (0,1) understate the purely procedural utility of randomization in cases where the treatment has many more values (such as in complex factorial designs).
Second, their focus on the limited value of randomization shies away from addressing the fact that observational methods offer no superior alternative. Thus it may be randomized experiments are not superior to observational methods (fully debating that point requires considerably more space than what is available here), but neither then are observational methods inherently superior to randomized experiments in cases where covariates are unobserved and/or the causal diagram is unknown.
Finally, Deaton and Cartwright’s argument misses perhaps the most important distinguishing feature of experiments - a feature that has nothing to do with randomization. That is, experiments induce the value of treatment. In observational research, the value of treatment must be measured, likely with error and a fair amount of missing data. By setting the value of treatment (do(X=x) in Pearl’s notation), we are assured that cases where X=1 are indeed treated cases and cases where X=0 are not, and so forth. The inherent “What if?” logic of experimentation, which long precedes the 20th century logic of randomization, is premised on this more fundamental idea that distinguishes experimental and observational research. We can describe the world and begin to see how it works by setting the value of an input and observing outputs even if we are unable to explicitly identify a causal effect. This is, for example, present in the inherently experimental nature of survey research; if I ask a question, I am setting the value of a treatment (the question) and measuring the response. I am unable to identify an SATE in this design, but I start to learn something about human reasoning by imposing a treatment/question and seeing what outcomes result. I may not learn anything at all about the thing that I care about if I simply sit back and let the world play out as it otherwise would.
In sum, Deaton and Cartwright’s working paper is an important read. It will surely provide considerable leverage for skeptics of experimental research. For experimentalists, I would say that much of its content will be familiar though will offer a rigorous refresher of your methodological education. There are some further issues raised in the paper (positive and negative in my view) that I will aim to respond to in a subsequent post or two (though a great editor once told me “never describe anything as the first in a series,” so we’ll see if that actually happens).
Except where noted, this website is licensed under a Creative Commons Attribution 4.0 International License.