When I started my scientific training, I thought I was going to learn the right way to do psychological science. A couple years later, I was convinced there were big problems with the way psychology research was being done, and was on my way to being a committed reformer. I’ve now been writing about reform for about as long, and I’m starting to ask again: what am I getting wrong?
On April 15, 2020, I launched my blog “How Do You Know?” on Psychology Today with the ambitious title “The Most Important Question in Psychology Research.” In it, I described how reading popular accounts of psychology research from people like Malcolm Gladwell led me to study psychology. I wrote:
When I began graduate school, I was vaguely aware that some scientists criticized Gladwell’s work as being not quite right, simplifying and getting some details wrong. At the time, this didn’t bother me, because I thought that Gladwell got the big picture right, and that I could get the details nailed down by reading the details of the articles.
The most important lesson from my eight years of professional development is that this isn’t true. Psychology research has serious methodological problems. It’s not just popular science writers like Gladwell missing some details; it’s many of the most prominent and successful psychological scientists making big mistakes due to serious misunderstandings about how to collect and think about scientific evidence.
The most important finding in psychology in the last generation was a 2015 paper attempting to replicate 100 findings from prominent journals in the subdisciplines of social and cognitive psychology. Just more than one-third of these findings were successfully replicated.
Most of the findings chosen from these top journals were not robust or general enough that a new set of scientists, following the same procedures laid out in the original article, could get the same results.
The lesson for many psychologists like me was clear: The way we as psychological scientists have been conducting research isn’t good enough. The methods we use to vet published research lets through too many “false positives.” Our current methods allow us to claim that an effect is real when it isn’t.
…
The most important research question in psychology is “how do we know what’s true?”
At the broadest level, I still stand by my assertion. Psychology is still grappling with methodological issues, and I do think we need to grapple with the question “how do we know what’s true?” before we can live up to the promise of the field.
And yet, new research has led me to rethink how I view the central piece of evidence in this story. Many studies did not replicate. Is that the central problem with psychological science?
This depends on what you take psychology studies to be doing. The standard story is something like this:
A researcher comes up with a new idea for how something works.
The researcher designs a study that will definitively test whether that idea is right.
The results of the test tell us whether the idea was right, and so let us know whether we should move forward with the theory or abandon it.
Implicit in that story is that the test you set up should be something that consistently works. Say you have a theory that “conflicting beliefs cause mental stress that makes people change their attitudes,” and you have a study that’s been carefully designed so that it can test whether the theory is right. If results come out in support of the theory, great! You can move on developing the theory. But if results don’t, you should question whether the theory was right.
Using this logic, of course, the results shouldn’t just have to come out right once. They should reliably come out the same way (nearly) every time you perform the study. If they don’t, then the theory doesn’t really hold.
To the extent that researchers believe they’re following this method, then they should take replication seriously. Every single replication attempt has the potential to falsify the theory. Finding that lots of psychology studies don’t replicate implies that lots of theories are false.
The thing is, philosophers of science have known for generations that this model of science doesn’t work. It’s also not the way that our most successful, advanced sciences work. There are always open questions in science, facts that don’t quite fit together. Far from disconfirming popular theories, these present working scientists with puzzles that help them better refine and develop their theories.
More to the point, recent research by Berna Devezer and her colleagues has found that replication does not always mean that results are becoming more accurate. In certain situations, we can set up experiments that replicate more often, but are actually less accurate in estimating how things are related to each other in the real world. (This is an idea I’ve been digesting for some time. I’ll discuss it further in a future post.)
In other words, replication is important, but it’s not the defining quality of good research. In fact, all of the steps in the naive model presented above can be objects of inquiry in their own right. For example:
Where did this research idea come from? Is it internally consistent? How specifically is it meant to work? Is it consistent with results from other studies?
In the study testing the theory, are there any other possible explanations for the results? Do we know for sure that the way the psychological process was measured really captures that idea? Or does it just sound plausible?
If the results don’t support the theory, are there other interpretations we should consider? Could a modified version of the theory explain the results we got?
As Olivia Guest and Andrea Martin explain, another way to look at science is as making descriptions at different levels of abstraction line up with one another. If you move from a hypothesis to data and see that they don’t line up, that should cause you to move up from the level of data to question what the hypothesis should have been. A perfectly valid result of the study then is not “accept” or “reject” but “rethink.”
Guest and Martin’s thinking is inspired by and in line with the work of Imre Lakatos, a philosopher of science who succeeded Karl Popper. It also opens up a wealth of new questions for scientists to ask a number of different levels of abstraction, and offers a way to organize these questions.
I still think the 2015 Open Science Collaboration, and the low rate of replication it demonstrated, was important. It served as a wake up call that the standard story we were telling ourselves about our research wasn’t right. Psychological theories haven’t been standing up to severe critical tests. Psychologists need to improve the way we work.
The debates it stirred up, though, should not end with discussions of how to make sure all our research replicates. It should lead us to deeper, better questions about what we want from psychology research. “How do we know if research replicates?” gives way to “how do we know why—and when—replication matters?” To make sense of psychology, we need to question better.