Failing Up in Research
Stanford’s President May Have Falsified Alzheimer’s Results. What does that say about mental health research?
Psychology Today just released a heavily edited version of a blog post I submitted to them. The editing is disappointing: a lot of the content has been disassembled and reassembled in a way that makes it more difficult to follow. I’d much rather folks read the “version of record” I’ve posted here.
New reporting by The Stanford Daily suggests that President of Stanford Marc Tessier-Lavigne may have falsified data on an important basic science paper investigating Alzheimer’s disease. When the paper was published in 2009, Tessier-Lavigne was a top executive at the biotechnology company Genetech. Genetech quickly began the process of researching whether the finding could be used to develop a new drug for treating Alzheimer’s. According to several insider accounts—but not the official statements of Genetech or Tessier-Lavigne—the project was dropped when it was found that Tessier-Lavigne had faked his results.
The Stanford Daily article reporting on this story is in-depth, involving interviews with multiple sources and the review of many public sources of information on the science and business. My own interest in the story stems from the way that it encapsulates many of the reforms needed in scientific research into psychology and mental health treatment. The Alzheimer research community appears to have “moved on” from this result, likely from having tried and failed to get the same results in other labs. The set of institutions and incentives that allowed it to happen, however, remain in place.
Falsifying Up
There is a cheeky saying among some internet commentators that business executives can sometimes “fail up.” That is, they do a poor job running an organization or a team, but they still get large bonuses, promotions, or opportunities to run new prestigious projects. In science, there appears to be a comparable illicit route to advancement: falsifying up.
When I started my graduate training in psychology in 2011, there was a term emerging to describe the poor research and statistical practice that could be used to generate incorrect—but publishable—research “findings.” We called it p-hacking, because it was a way to “hack” (or distort) the probability value associated with a statistical test. Getting a good (low) probability value was and is the key to publishing scientific research in prestigious scientific journals, and few journals—especially in psychology at the time—were in the business of checking up on whether these values had been “hacked” or distorted.
The behaviors needed to distort statistical tests were not as brazen as falsifying data. Tessier-Lavigne is accused of having reported incorrect images from biological tests, repeating the same image several times as if they came from the different tests, to make the results look more consistent and reliable. This is literally providing fake results. The less obvious—but still pernicious—methods common to psychology included: cutting out studies or sets of results that didn’t support the story the researcher was trying to tell (e.g., let’s just not tell anyone about that “failed” study), checking for any possible subgroup where the story was true (e.g., it works in left-handers!) and focusing on that, measuring several related outcomes and only reporting the ones that support the story (e.g., it doesn’t reduce depression, anxiety, or stress, but it does slightly improve life satisfaction—let’s talk about that!), or monitoring the data as each new person is recruited to see if the statistical test becomes significant and then immediately stopping early if and when it does. (This is just a list of prominent examples: there are whole books on the subject.)
Statistical simulations of the behaviors described above show that, if they are used routinely and in combination, researchers can find support for a story they are trying to tell—even when it is completely wrong—over 50% of the time.
So one way to get hired to a prestigious position out of graduate school was to distort key data as a student, get it published just before you were interviewing for jobs, and then use that exciting new result to get hired. Once you were hired, you might be able to do the same thing to secure a research grant—a key accomplishment increasingly needed to get tenure at a university. The problem from the perspective of the public was that this meant some portion of exciting new psychology research was preventably wrong. (That portion is likely over 50%.) In other words, it was wrong because the scientists were avoiding best practices in research so that they could advance their careers.
One obvious reply to this is that surely the scientific record corrects itself. After all, Alzheimer research moved on from Tessier-Lavigne’s finding after a few years after the field realized it was a dead end. That’s true, but it’s also true that it can take years—and often millions of wasted research spending—before these errors are identified. For example, after hundreds of studies were published on possible gene for depression, a systematic review found that the overall association simply did not hold up. It wasn’t a depression gene, and we had just needed one large, well-done study to figure that out.
A larger, systemic problem with this practice is the set of incentives placed on researchers. Once a researcher is hired by a university, that university becomes invested in them and in promoting and protecting their reputation. Once a researcher is hired based on distorted or falsified data, it becomes hard to fire them. Prominent colleagues interested in protecting the reputation of the institution and the field may even jump in to criticize people who point out the distortion of results. A statistical simulation of this process from an evolutionary lens finds that the set of incentives currently in place tends to “select for” (actively promote) poorer scientific practices. (That paper is, fittingly enough, titled “The Natural Selection of Bad Science.”)
Seeing a prominent case of potential “falsifying up” leading all the way to become President of Stanford strikes a nerve in me as a researcher, and I hope it does in all readers interested in finding a real treatment for Alzheimer’s. Experts in detecting falsified research like Elizabeth Blik have been alleging that Tressier-Lavigne was falsifying results since 1999. (Blik said she would “testify in court” over this.) Yet he was still moving up as late as 2016, when he was hired as President of Stanford. The key to making mental health research truly self-correcting is to set up incentives so that honest, transparent practices are best for the careers of scientists. Science only self-corrects through allowing scientists to correct it.