Personality Replicates. So What?

If replication isn't all it was cracked up to be, what does it offer?

Dec 15, 2020

On my Psychology Today blog, I highlighted research from 2019 showing that a large number of effects from personality psychology replicate when they are conducted on a new sample. Given the way I’ve been digging into the literature critiquing the Credibility Revolution, this might seem at odds with the developing consensus. Recent social media posts by scholars I admire (which is a good place for up-to-date thinking) suggest that replication doesn’t much matter--and that it certainly shouldn’t be taken as The Marker of Good Science. So why even mention it, much less tout it as an indicator of Personality Psychology’s success?

I’ll lay out a couple reasons that I think replication matters for mainstream personality and social psychology, and hopefully make these arguments in a way that is (largely) compatible with the broader discussion of increasing replication as a goal of scientific reform. The arguments I’m presenting aren’t necessarily new ideas, but ways of thinking through how the “standard” reform arguments apply in this context. These arguments are also not meant to dismiss critiques of replication. Rather, I intend them in the spirit of acknowledging that replication is not The Marker, but potentially One Marker of Good Science (among many). It may not be either necessary or sufficient for doing good research, but it certainly seems useful.

Replication for Description

The Big Five personality traits are not a theory. There may be articles that talk about Five Factor Theory, but I would submit that the Big Five personality traits do not do the things we typically want from a theory. There are no (consensus, canonical) explanations of:

Why we have these traits.
How these traits, specifically, should be related to life outcomes.
Why these traits are related to life outcomes.
What these traits mean for day-to-day life (beyond questionnaires).

You could go on with lists like this, but the point is basically to show that personality psychology, like much of psychology, falls into the “pile of effects” style of science. We can identify regularities using statistics, which provide a way of characterizing relationships between variables. We cannot, however, predict in any rigorous or systematic way (beyond preregistering our common-sense hunches) what personality trait will predict what outcome. We also have no principled way of predicting which factors will alter (or moderate) our results, other than common-sense hunches.

Given that paradigm, the best case scenario is that personality psychology establishes robust regularities. You found that extraverts report being more satisfied with their lives? Interesting. When other people repeat the study, they find the same thing? That might really start to perk our attention. The next step would be to start trying to figure out why. In this thinking, the robust regularity is important because it’s a valuable first input into genuine theory building.

By genuine theory building, I mean getting very specific (mathematically specific!) about how and why these things are connected. Is extraversion connected to life satisfaction because it pushes us to have stronger relationships, and these lead to more life satisfaction? Is it connected because speaking out increases social status, and having higher status in a group increases life satisfaction? If that’s the case, does the effect of extraversion on life satisfaction vary according to how much people also value achieving high status? Does it really help achievement-motivated people more than others?

Based on Soto’s work, I believe personality psychology is ready for this type of theory-building. We may not know (at a deep, theoretical level) what extraversion is beyond a score on a test. But at least we know that test gives us reliable results!

Compare that with ego depletion. In that case, we have hundreds of studies showing some relationship between doing one hard task and working less on a second hard task, but whenever we try to pin down the relationship by forcing ourselves to use a preselected method, no effect appears. This suggests that ego depletion isn’t even establishing a reliable regularity. There is almost certainly something out there called willpower or self-control. We just don’t know how to measure it reliably or whether it will have consistent relationships with anything else we care about. Ego depletion, therefore, isn’t ready for prime-time. I wouldn’t want to spend a lot of time and energy building a theoretical model of any ego depletion effect yet, because the possibility exists that the core regularities we’d build our theory on aren’t actually core or regular!

I’ve headed this bit “Replication for Description” because I think it connects with a point from a previous post I made about Iris van Rooij’s work. In that post, I argued that I thought descriptive work was potentially in tension with model-building; however, van Rooij’s own view of her perspective may not see tension between these approaches. This idea of first finding the important regularities in the topic you’re studying, and then doing theory building on that, is one way they may not be in tension. The idea of using statistics and replication to identify these regularities is one of the statistical approaches that still makes the most sense to me.

Replication as Area Audit

The second reason I think replication is important is almost certainly going to be less popular than the first, but I still think it is important. Psychologists do not always (perhaps even typically) do statistical analyses rigorously or correctly. Researchers routinely report significance tests that have not been corrected for the number of tests. They routinely test effects whether effects work only in one subgroup or another, and then write the manuscript up as if (common-sense) theory implied that “of course” effects should only be in this subgroup. They routinely run many studies in a semester, and only choose those that turned out with interesting significant results to write up and report in the literature. All these patterns of behavior mean that many (most?) reported results in the research literature will be due to false positives, or Type I errors. Given that we have good reason to suspect a high rate of false positives in the literature, we may want to do a sort of audit of a research area to check this statistical validity.

This mentality is based purely on statistical, not theoretical, considerations. Scientists publishing in social and personality psychology use statistical tests in virtually every new study. Statistical tests are known to be vulnerable to certain types of errors. If we are basing decisions on these tests, we should want to do periodic checks on how they’re working.

Replication for error detection does not need to be about setting up Science Cops or punishing people who have done sloppy research. There are certainly arguments to be made that people who “cut corners” in their statistics by not reporting all of the conditions or not reporting all of the studies are being dishonest. Yet simulations of the research practices show that even if everyone is being completely honest and transparent about the research they’ve done, structural issues is the publication of papers and promotion of scientists can lead to a research literature that is not reliable.

Conducting the exact same study a second time seems like a very reasonable approach to getting a higher level idea of the amount of unreliability in a research area. This can then be used to help guide the norms and policies set in place by communities of scholars. If many (or most) of the results reported in a general area do not come out the same way when the study is replicated, this is reason to stop and reflect on the overall standards and practices being used.

The Replication Crisis in social and cognitive psychology (and in many other areas of science) forced this type of critical reflection. Given the results of field-wide audits, perhaps researchers should rethink their standard practices. A couple years ago, I would have argued that psychologists should (primarily) focus on trying to alter our practices to increase replication. Today, I believe the “Replication Crisis” is more of a wake-up call to notice that best scientific practices don’t do what we thought they did--and that, in fact, we ought to re-examine what we thought we wanted.

So does the higher rate of replication in personality psychology mean that this field doesn’t need re-examining, while social psychology does? No, I think there are important discussions around the broader goals of science raised by the Replication Crisis that are not addressed by replication, and that apply directly to personality psychology (see the section above on theory). However, I do think this type of project replicating many effects together tells us something about the reliability of effects in personality psychology in general. If you were to pull out a random correlation between a personality trait and a self-reported life outcome in the literature, there’s a pretty good chance it would be reliable. This is a generalization that I think is (cautiously) warranted, and that allows us to think about effects beyond those specifically addressed here. In comparison to other areas of the literature, I still think this counts as a success.

Deep Dish Psychology

Discussion about this post