The brain has hit the big time. Barack Obama has just announced $100 million of funding for the BRAIN Intitiative—an ambitious attempt to apparently map the activity of every neuron in the brain. On the other side of the Atlantic, the Human Brain Project will try to simulate those neurons with a billion euros of funding from the European Commission. And news about neuroscience, from dream-decoding to mind-melding to memory-building, regularly dominates the headlines.
But while the field’s star seems to be rising, a new study casts a disquieting shadow upon the reliability of its results. A team of scientists led by Marcus Munafo from the University of Bristol analysed a broad range of neuroscience studies and found them plagued by low statistical power.
Statistical power refers to the odds that a study will find an effect—say, whether antipsychotic drugs affect schizophrenia symptoms, or whether impulsivity is linked to addiction—assuming those effects exist. Most scientists regard a power of 80 percent as adequate—that gives you a 4 in 5 chance of finding an effect if there’s one to be found. But the studies that Munafo’s team examined tended to be so small that they had an average (median) power of just 21 percent. At that level, if you ran the same experiment five times, you’d only find an effect on one of those. The other four tries would be wasted.
But if studies are generally underpowered, there are more worrying connotations beyond missed opportunities. It means that when scientists do claim to have found effects—that is, if experiments seem to “work”—the results are less likely to be real. And it means that if the results are actually real, they’re probably bigger than they should be. As the team writes, this so-called “winner’s curse” means that “a ‘lucky’ scientist who makes the discovery in a small study is cursed by finding an inflated effect.”
So, a field that is rife with low power is one that is also rife with wasted effort, false alarms and exaggerated results.
Across the sciences
These problems are far from unique to neuroscience. They exist in medicine, where corporate teams have only managed to reproduce a minority of basic studies in cancer, heart disease and other conditions. They exist in psychology—a field that I have written about extensively, and that is now taking a lead in wrestling with issues of replicability. They exist in genetics, which used to be flooded with tiny studies purporting to show links between some genetic variant and a trait or disease—links that were later disproved in larger studies. Now, geneticists are increasingly working with larger samples by pooling their recruits in big collaborations, and verifying their results in an independent group of people before publishing. “Different fields have learned from similar lessons in the recent past,” says Munafo.
Munafo himself, who studies addictive behaviour, works in the intersection between genetics and brain sciences. Over the past decade, he has published several meta-analyses—overviews of existing studies—looking at links between genetic variants and mental health, attention and drug cravings, brain activity and depression, and more. And he kept on seeing the same thing. “These studies were all coming up with the same average power of around 20%,” he says. “The convergence was really striking given the diversity of fields that we studied.”
He decided to take a more thorough look at neuroscience’s power, and enlisted a team of scientists from a wide range of fields. They included psychologist Kate Button, a postdoc in Munafo’s team, and Claire Mokrysz, a former Masters student now studying mental health at UCL. John Ioannidis also signed up–his now-classic paper “Why Most Published Research Findings Are False” has made him a figurehead among scientists looking at the reliability of their discoveries. Another partner, psychologist Brian Nosek, is at the forefront of efforts to make science more open and reliable and leads the newly opened Centre for Open Science. “We were trying to present a range of perspectives rather than come across as one particular interest group trying to criticise one another,” says Munafo. “We want to be constructive rather than critical.”
Together, the team looked at every neuroscience meta-analysis published in 2011—49 in total, including over 730 individual studies between them. Their average power was just 21 percent. Their analysis is published in Nature Reviews Neuroscience.
“I think this is a really important paper for the field,” says Jon Simons, a neuroscientist from the University of Cambridge. “Much of neuroscience is still relatively young, so the best and most robust methods are still being established. I think it’s a sign of a healthy, thriving scientific discipline that these developments are being published in such a prominent flagship journal.”
Ioannidis agrees. “As the neuroscience community is expanding its reach towards more ambitious projects, I think it will be essential to ensure that not only more sophisticated technologies are used, but also larger sample sizes are involved in these studies,” he says.
If there’s a problem with the team’s approach, it’s that most of the meta-analyses the team considered looked at genetic associations with mental traits, or the effect of drugs and treatments on mental health. One could argue that these only reflect a small proportion of neuroscience studies, and are already “covered” by fields like genetics and medicine where issues of replicability have been discussed.
“This is the main limitation and a fair criticism,” says Munafo. To address it, his team looked at two other types of experiment. In an earlier study, Ioannidis showed that brain-scanning studies, which looked at brain volume in people with mental health conditions, reported a surfeit of positive results—a sign that negative studies were not being published. By analysing these studies again—all 461 of them—he showed that they have a median statistical power of just 8 percent.
The team also looked at 40 studies where rats were put in mazes to test their learning and memory. Again, these were typically so small that they only had a median power of 18 to 31 percent.
“Neuroscience is so broad that it’s hard to generalise,” says Munafo, “but across a diverse range of research questions and methods—genetics, imaging, animal studies, human studies—a consistent picture emerges that the studies are endemically underpowered.”
Fixing the problem
Unfortunately, raising power is easier said than done. It costs time and money, and there are many reasons why studies are currently underpowered. Partly, it’s just the nature of science. Power depends not just on sample size but on the strength of the effect you’re looking at—subtler effects demand larger samples to get the same power. But when you’re the first to study a phenomenon, you don’t know the size of the effect you’re looking for. You’re working off educated guesses or, perhaps more likely, what you have the time and money to do. “We’ve all done this where we’ve got a little bit of resource available to test a novel question and we might find some results,” says Munafo. Without such forays, science would grind to a halt.
The problem isn’t in the existence of such exploratory studies, but in how they are described in papers—poorly. “We need to be clear that when we’re having a punt and running a study that’s only as big as we can afford, it’ll probably be underpowered,” says Munafo.
Transparency is vital. In their paper, the team outlines several ways of addressing the problems of poorly powered studies, which all revolve around this theme. They include: pre-registering experimental plans before the results are in to reduce the odds of tweaking or selectively reporting data; making methods and raw data openly available so they can be easily checked by other scientists and pooled together in large samples; and working together to boost sample sizes.
But ultimately, the problem of underpowered studies ties into a recurring lament—that scientists face incentives that aren’t geared towards producing reliable results. Small, underpowered studies are great at producing what individuals and journals need—lots of new, interesting, significant and publishable results—but poor at producing what science as a whole needs—lots of true results. As long as these incentives continue to be poorly aligned, underpowered studies will remain a regular presence. “It would take a brave soul to do a tenth of the studies they were planning to do and just do a really big adequately powered one unless they’re secure enough in their career,” says Munafo.
This is why the team is especially keen that people who make decisions about funding in science will pay attention to his analysis. “If you have lots of people running studies that are too small to get a clear answer, that’s more wasteful in the long-term,” Munafo says. And if those studies involve animals, there is a clear ethical problem. “You end up sacrificing more animals than if you’d just run a single, large authoritative study in the first place. Paradoxically, I know people who’ve submitted animal grants that are powered to 95 percent but been told: ‘This is too much. You’re using too many animals.’”
“I’m thrilled they’ve written this review,” says David Eagleman, a neuroscientist at Baylor College of Medicine. “Hopefully, this sort of exposure can build towards a reduction of wastefulness in research, not only in terms of taxpayer dollars but in terms of scientific man-hours. “
Reference: Button, Ioannidis, Mokrysz, Nosek, Flint, Robinson & Munafo. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience http://dx.doi.org/0.1038/nrn3475