The Surprising Value of Faking Your Data

04 Nov 2016

No scandal has so shaken political science as the revelation that one-time rising star Michael LaCour had fabricated the data underlying a prominently published article about the effects of face-to-face conversations on support for gay rights. The idea that so much press attention, scholarly thinking, and real-world resources has been expended on fraudulent data was troubling and it challenged our discipline’s faith in our credibility. But what if I told you that faking data and expending your time and effort analyzing and writing it up might have some payoff? The radical idea of this post is that faking your data could sometimes make you a better scientist.

Yesterday, Nautilus magazine published a fascinating story by Jonah Kanner and Alan Weinstein about the scientific process behind this year’s splash discovery of gravitational waves, a phenomenon theorized by Einstein but previously unobserved. The detections were made by a massive international collaboration called LIGO, using machines designed to detect minute fluctuations in the shape of spacetime. The infrastructure to do this is massive and expensive, involving multiple, extremely precise laser-based measuring devices that wobble in response to movement (think super-precise seismograph). The data generated from these detectors is full of noise which much be further filtered to remove movements attributable to local events - rather than the super-distant black hole events of interest to science - things like an earthquake or someone breathing on the machine. The detections of gravitational waves are based on tiny signals that emerged from the noise of these localized influences.

The stunning part of the story is that long before these detections, LIGO had taken a remarkable step: they had agreed to secretly fake their own data! A handful of collaborators were empowered to work in secret and arbitrarily introduce a fake signal into the data at any point in time, without the knowledge of the rest of the team. The article describes the process in detail and tells the story of how the first signal that LIGO detected - a signal that would trigger thousands of hours of data analysis, discussion, writing, editing, and in-person meetings - was indeed a signal placed by this clandestine group. Here’s how Kanner and Weinstein describe the moment when the team of fakers reveals whether the data was real or not:

In March 2011, we gathered in a hotel near Arcadia, California to review all the evidence and the paper draft, and vote on submitting it to a journal. There were more than 300 people in the room and about another hundred more connected via the Internet. We brought lots of champagne. We discussed. We voted to approve the paper draft. Speeches were made celebrating the long road we had traveled, from building incredible detectors, to finding a signal, to finally executing the entire procedure for claiming a detection. We opened the champagne.

Then Jay Marx, the director of the LIGO Laboratory, who had been carrying a tattered envelope in his pocket for more than six months, took to the stage […] and told us all that the Big Dog was a Big Fake, and that we had just completed the first successful discovery fire drill in gravitational wave observation history, we still treated it as a moment of celebration. We raised glasses of champagne, and toasted our fake success. It was a strange, hollow feeling.

Champagne aside, the moment sounds chilling. All of the work of analyzing data, mulling over the results, debating them with coauthors, discussing how to appropriately characterize this tiny, one-off observation cost countless hours of work and mental anguish. And it was all apparently for naught.

Yet Kanner and Weinstein point out that whatever the effort cost, it also carried innumerable rewards. This fake detection that they didn’t know was fake had made them wrestle with issues they never would have anticipated until they actually had a signal in front of them:

Big Dog had motivated a flurry of work, including big steps forward in our ability to measure the masses of the source objects (the neutron star or black hole) using only the gravitational wave signal. Most significantly, our collaboration had agreed for the first time what standards we’d use, and how we’d minimize our biases. For the first time, we had decided that we had enough evidence for a detection.

Or as they say elsewhere in the article, “The fake injection bugaboo forced us to keep an open mind, apply skepticism and reason, and examine the evidence at face value.” While the collaborators had run simulations to try to see what a signal would look like and had thought about how to analyze the data and validate a hypothetical finding, that work was carried out under low stakes; everyone knew the data was fake. They didn’t have to take it seriously.

By faking the data, the scientists didn’t know whether the signal was real. But they did know that it could be fake. That helped them enforced the kind of radical, skeptical objectivity that is necessary for the scientific process to work - the kind of skepticism that is supposed to be a bedrock of scientific work. In reality, any signal - even one not planted by the fakers - could be nothing but noise. Sometimes we lose track of that as we analyze our data; we forget that this could be an error; the quest to find a signal in the noise can lead us down dangerous, forking paths. The LIGO team’s courage to fake their data as a test of their own scientific practice sounds like it made them better researchers. And it should give us some additional faith that the 2015 detection they reported was not an overly optimistic interpretation of noisy data.

There’s something to be learned there for every scientist. While we, of course, can never publish fraudulent data and should never actively work to deceive others about our research claims, there may be some value in occasionally lying to ourselves. It can serve as a reminder about the unavoidable reality of false positives and as a check on whether we p-hack and engage in confirmation biases. I’ve previously advocated for a “Ben-bot” intermediary between scientists and their source data as a means of preventing fraud. A Ben-bot could also be adapted to sometimes lie to us to run us through an unanticipated fire drill (provided she or he eventually tells us it’s only a drill).

Indeed, this kind of scientific fire drill triggers exactly the kind of activity called for by advocates of study preregistration (precise specification of analyses, etc.), but it sets the stakes higher. It pretends that we have the actual data (rather than a simple, known simulation) and tests how we handle the situation. If you perform well in a drill, you should be better able to escape the real fire unscathed. If you perform badly, it’s a clear reminder to up your game. Maybe it’s worth thinking about faking our data more often.

social science methods reproducibility preregistration data fraud

Creative Commons License Except where noted, this website is licensed under a Creative Commons Attribution 4.0 International License. Views expressed are solely my own, not those of any current, past, or future employer.