In a paper in Science, researchers attempted to replicate 100 recently published psychology studies. They report that they could successfully repeat the original experiments in most of them, but that they were able to reproduce the original results in less than half.
Prof. Dorothy Bishop, Professor of Developmental Neuropsychology, University of Oxford, said:
“So what do we conclude from this study? There are three practices that are known to distort the scientific literature: publication bias (a tendency to only publish positive results), selection bias (also known as p-hacking – selectively reporting only that subset of data that is ‘significant’), and small studies (lack of statistical power, which leads to an increase in false negatives and false positives). Nosek et al are careful to point out that there are other reasons too why a study may not replicate: there may be aspects of the sample, the experimental setting, or the quality of the experimentation that make a difference. For instance, if I do a study showing dyslexics have abnormal brain waves, and someone else fails to replicate it, I may ask whether (a) their dyslexic sample, or control sample, differed from mine in some key aspect (maybe they were less severe, or the controls also had reading problems), (b) whether the recording of brainwaves was done in the same way (were electrodes in the same positions; was the room electrically shielded ;was there background noise); and (c) whether the replication experimenters were competent (they may have been inexperienced at fitting electrodes to heads; data analysis in this field is complex – they may have messed up on the computations).
“My own take on this is that points (a) and (b) are not all that plausible here, given that the study was set up to minimise such differences, with replicators liaising with original authors and being totally transparent about methods. It’s possible that there are other, unreported, variations that affect results, but one does then wonder how important a result is if it can come and go depending on some minor changes in methods. In the case of (c), it might be possible to detect oddities in the data that could suggest a study was not done properly – e.g. if the results of the replicators deviated substantially from those of the original study in terms of the distribution of scores: this is the kind of argument advanced by Simone Schnall, an author of an original study who had a very public dispute about the failure to replicate results from one of her studies.
“Overall, though, I see this study as illustrating that we have a problem, one that could be tackled by taking steps to minimise the impact of the three systematic factors known to create bias in the field. Here are three ways we could improve matters substantially:
(a) Requiring public preregistration of research protocols; This has become common practice in the field of clinical trials, where it became clear that selective reporting of positive studies, and positive results within studies, was distorting the literature. Simply, if you are required to specify in advance what your hypothesis is and how you plan to test it, then there is no wiggle room for cherrypicking the most eye-catching results after you have done the study.
(b) Requiring studies to have adequate sample sizes; In many fields, it is difficult and expensive to have large samples: this is true whether one is testing humans or other animals. The solution in fields such as genetics, where it became clear that many early results were not reproducible, was to get researchers to join forces to do multi-site studies. Other research fields need to adopt this approach.
(c) Publishing studies that have null results. Null results are only interesting if they come from well-designed studies with adequate statistical power, but in that case they are informative and important. So this goes hand in hand with a and b.
The additional issues around the impact of setting and experimental competence could also be addressed by taking steps such as these:
(d) Ensuring researchers document their methods in sufficient detail that others can replicate the study (possible posting a video of the experiment on the web).
(e) Having open data, materials and analysis scripts available alongside the published paper.
“Finally, as Nosek et al noted, this study focuses on psychology, but this does not mean that lack of replicability is specific to, or particularly bad in psychology. In any case, adoption of practices such as a to e would be beneficial across all fields of science. They are simply ways of ensuring that we are doing science as well as we can. Nosek et al are to be commended for doing a study that demonstrates the need for action; we now need to ensure that funders, journals and the scientists themselves come on board to take action.”
‘Estimating the reproducibility of psychological science’ by Nosek et al., published in Science on Thursday 27th August.
Declared interests
Prof. Dorothy Bishop: No conflicts of interest