The ability to repeat a study and find the same results twice is a prerequisite for building scientific knowledge. Replication allows us to ensure empirical findings are reliable and refines our understanding of when a finding occurs. It may surprise you to learn, then, that scientists do not often conduct – much less publish – attempted replications of existing studies.
Journals prefer to publish novel, cutting-edge research. And professional advancement is determined by making new discoveries, not painstakingly confirming claims that are already on the books. As one of our colleagues recently put it, “Running replications is fine for other people, but I have better ways to spend my precious time.”
Once a paper appears in a peer-reviewed journal, it acquires a kind of magical, unassailable authority. News outlets, and sometimes even scientists themselves, will cite these findings without a trace of skepticism. Such unquestioning confidence in new studies is likely undeserved, or at least premature.
A small but vocal contingent of researchers – addressing fields ranging from physics to medicine to economics – has maintained that many, perhaps most, published studies are wrong. But how bad is this problem, exactly? And what features make a study more or less likely to turn out to be true?
We are two of the 270 researchers who together have just published in the journal Science the first-ever large-scale effort trying to answer these questions by attempting to reproduce 100 previously published psychological science findings.
Attempting to re-find psychology findings
Publishing together as the Open Science Collaboration and coordinated by social psychologist Brian Nosek from the Center for Open Science, research teams from around the world each ran a replication of a study published in three top psychology journals – Psychological Science; Journal of Personality and Social Psychology; and Journal of Experimental Psychology: Learning, Memory, and Cognition. To ensure the replication was as exact as possible, research teams obtained study materials from the original authors, and worked closely with these authors whenever they could.
Almost all of the original published studies (97%) had statistically significant results. This is as you’d expect – while many experiments fail to uncover meaningful results, scientists tend only to publish the ones that do.
What we found is that when these 100 studies were run by other researchers, however, only 36% reached statistical significance. This number is alarmingly low. Put another way, only around one-third of the rerun studies came out with the same results that were found the first time around. That rate is especially low when you consider that, once published, findings tend to be held as gospel.
The bad news doesn’t end there. Even when the new study found evidence for the existence of the original finding, the magnitude of the effect was much smaller — half the size of the original, on average.
One caveat: just because something fails to replicate doesn’t mean it isn’t true. Some of these failures could be due to luck, or poor execution, or an incomplete understanding of the circumstances needed to show the effect (scientists call these “moderators” or “boundary conditions”). For example, having someone practice a task repeatedly might improve their memory, but only if they didn’t know the task well to begin with. In a way, what these replications (and failed replications) serve to do is highlight the inherent uncertainty of any single study – original or new.
Read More: Here