Why unreliable tests are flooding the coronavirus conversation

Flawed methods. Faulty materials. Here's how to make sense of the wildly different results from current antibody testing.

As disagreements pile up over easing COVID-19 restrictions and reopening businesses, scientists are racing to gauge how many people have truly been infected. Among other reasons, measuring the pandemic’s spread can clarify how deadly the disease is and how many people are presumably immune to it, which is crucial for anticipating future waves of the outbreak.

The cornerstone of this process is antibody testing, which looks for proteins made by the body’s immune system to combat invading pathogens. Antibodies can identify cases where people have caught COVID-19 even if they didn’t show any symptoms. Early antibody results suggest a substantial portion of cases have been missed by genetic tests for COVID-19, the first line of diagnostics for the disease.

The reported numbers vary widely—21 percent in hard-hit New York City versus roughly 6 percent in mostly spared Geneva, Switzerland—which in part is expected. But some of the differences are a consequence of rushing to publish results, and in some cases, troubles with the tests themselves.

Experts say that COVID-19 testing has now entered a “Wild West” phase. Inaccurate, unauthorized tests for the coronavirus are flooding the market due emergency deregulation of oversight. Compounding that, flaws in collecting data and analyzing results can produce misleading estimates—many of which would normally be caught by the peer-review process, a safeguard built into academic publishing. But in the mad dash to make sense of this pandemic, many research teams are instead leaning on press releases or unreviewed articles called preprints to spread the word.

“It’s very difficult to really figure out what’s going on,” says Samuel Scarpino, who heads Northeastern University’s Emerging Epidemics Lab. “What we’re facing here are some numbers that are being reported without the context necessary to interpret them.”

One notable example is a recent survey for COVID-19 antibodies among residents of Santa Clara County, California. On April 17, Stanford University researchers reported in a preprint that, of the 3,300 people they’d screened, 1.5 percent were positive. Ultimately, the team claimed that the county actually had 48,000 to 81,000 cases of COVID-19, which seemed implausibly high given only 50 deaths and about 1,500 cases were recorded at the time.

This funky math has been widely criticized, and because the study entered the public sphere without peer-review, it launched a controversial debate over whether COVID-19 is less deadly than epidemiologists have said. Since then, the Stanford team has updated its report, dropping its estimate to a range seen by other studies.

In general, this first wave of antibody testing “ends up being a balance between a quick and dirty approach, and something that’s more rigorous and representative, but will take longer,” says Natalie Dean, a biostatistician at the University of Florida who has spotted problems among other antibody surveys. “So we’re seeing the fast ones coming out the earliest.”

False negative, false positive

A survey’s results are fundamentally limited by the quality of the test used to detect COVID-19.

The problem is, many tests on the market are demonstrably not up to snuff. In the U.K., officials paid $20 million for antibody testing kits from China that didn’t work. The Spanish government had to trash 640,000 antibody testing kits that ended up having an accurate detection rate of just 30 percent. And a recent evaluation of 12 widely used antibody test kits, which declared upfront its funding by Anthem Blue Cross Blue Shield, the Chan Zuckerberg Biohub, and others, concluded that only two had reported accurate results 99 percent of the time—the benchmark for reliability set by the U.S. Food and Drug Administration on May 4.

Prior to this date, the U.S. was flooded with more than 150 unvalidated tests because the FDA had allowed companies to sell these products without seeking Emergency Use Authorization. The FDA shifted its policy after Congress opened an investigation into this runaway testing market.

“Testing is important, but how it’s done is really concerning to me,” says Michael Osterholm, an epidemiologist and director of the University of Minnesota’s Center for Infectious Disease Research and Policy. “People who are saying they’re going to use antibody testing as a way to reopen—that’s kind of like the person that looks down the barrel of a gun and pulls the trigger to see if it works.”

Osterholm worries that mediocre diagnostic tests will miss true COVID-19 infections, delivering what’s called a “false negative” and potentially prompting an infectious person to unknowingly spread the disease. As well, he says, faulty tests detecting COVID-19 antibodies, which should only be present if a person has been exposed to the virus, might deliver a “false positive” result, in which unrelated antibodies are flagged in a person who hasn’t yet been infected. If cities or states fall for shoddy tests with unreliable, misleading results, Osterholm says, it could endanger good decision-making on both personal and governmental levels.

“It’s the Wild West out there,” he says. “The FDA could do their job, and they should do their job. It’s that simple: Do your damn job.”

When bias creeps in

Even when the tests are reliable and accurate, it’s crucial to gather representative participants. In theory, this is easy—but in practice, it’s almost impossible to eliminate bias without testing 100 percent of a population.

For example, screening folks at a drive-through testing center preferentially captures data from the portion of a population with cars. The same goes for randomly dialing phone numbers and asking people to show up for testing at a local hospital—an approach that relies on transportation, or the ability to leave work. And in New York City, where scientists are randomly selecting people for testing in grocery stores, they are surveying a percentage of the population that’s at higher risk of infection simply by being out and about.

Although some experts questioned the tests used by the Santa Clara study, the team drew stronger criticism because of a sample population that skewed white, wealthy, and female. One argument suggests that this demographic might have a lower incidence of COVID-19, given how the disease hits harder among impoverished communities.

Highlighting the study’s confusing design, a contradictory argument notes that participants volunteered for the study based on a Facebook ad, and critics argue that such strategies are more likely to attract folks who already thought they’d been sick—potentially leading to a higher percentage of people with COVID-19 antibodies.

What’s more, the team did not disclose all their funders in the preprint version of the study, which would have normally been required under peer review. One of the funders is the founder of JetBlue Airways, who has voiced a strong desire to reopen the economy.

The team tried to correct for their biased sample, but their initial statistical analyses were flawed, says Andrew Gelman, a statistician at Columbia University who wasn’t involved with the study.

“They made some mistakes, and it was annoying to me, and I think it annoyed other people too,” Gelman says, noting that he probably went overboard in his public criticism. “I don’t think there are bad guys in this system, it’s just difficult.”

Eran Bendavid, who led the Santa Clara County investigation, says he would make different choices if he could do the study again, and he has corrected some of the noted statistical flaws. On April 30, the team released a revised version of the preprint that reduced the estimate of true COVID-19 infections, suggesting it is closer to 20 or 50 times what’s measured by genetic tests, depending on whether the numbers are scaled to match the county’s demographics. Bendavid adds that the criticisms about a biased sample population are legitimate—but that’s not something that can be changed at this point.

“I would do a much more careful job selecting a sample … and get something that’s as close to representative of our county as I possibly could,” he says. “We’ve done a host of exercises to try to understand the direction of the bias.”

Still, the Stanford group’s takeaway—that 1.5 to 2.8 percent of Santa Clara’s population had COVID-19 antibodies—is pretty much in line with surveys done where infections aren’t ballooning.

“It all seems to be giving a consistent message, which is that unless the area has been pretty hard hit, we’re seeing numbers in the single digits,” Dean says.

Recently, she notes, Swedish scientists retracted a study claiming that 11 percent of the Stockholm population had been exposed to COVID-19. That team tested samples at a blood bank for antibodies—and then discovered that their sample contained blood from donors who knew they’d been infected and were hoping their antibodies could be therapeutic.

“That can quickly cause a big bias—you don’t even need that many people for it to throw off the result,” Dean says. “There’s all these hidden layers that can change how representative the sample is.”

How to spot a solid study

Truly nailing down the scope of COVID-19 prevalence is possible, although experts have various opinions about how to proceed.

Osterholm suggests really focusing on tests that detect the virus’s genetic material—because they tell you where the germ is now, which is crucial for contact tracing and isolation. “Every person who has any symptoms suggestive of COVID-19 should be able to get a test today,” he says.

Another tactic relies on accurately sampling a target population. The National Institute of Allergies and Infection Diseases kicked off a new antibody survey in April, and gathering a representative sample of the population is at the forefront of their effort. By June, the team hopes to have tested 10,000 people across the United States and captured a snapshot of COVID-19 prevalence from shore to shore. To do that, the team is screening volunteers for various demographic variables.

“We have a computer program by which, on a daily basis, we’re assessing the [demographic] targets we’ve set based on U.S. census data,” says study principal investigator Matthew Memoli. “This allows us, in real time, to see how close we’re getting [to being representative] and make adjustments on a daily basis in recruitment.”

From there, the NIAID team is mailing home blood collection kits to survey participants, who then return their samples for laboratory testing. Rather than using a single test to identify COVID-19 antibodies, researchers are verifying results using multiple assays.

“There’s never going to be a perfect sample, and the analysis is never going to be perfect,” Memoli says. “But we’d like to be as accurate as possible, and have as good a representation of the country as we can.”

Read This Next

Why coronavirus evolution still continues to surprises experts
Who qualifies for a coronavirus booster shot? Why it’s still unclear.
Why vaccinations for several diseases are falling sharply in Brazil

Go Further

Subscriber Exclusive Content

Why are people so dang obsessed with Mars?

How viruses shape our world

The era of greyhound racing in the U.S. is coming to an end

See how people have imagined life on Mars through history

See how NASA’s new Mars rover will explore the red planet

Why are people so dang obsessed with Mars?

How viruses shape our world

The era of greyhound racing in the U.S. is coming to an end

See how people have imagined life on Mars through history

See how NASA’s new Mars rover will explore the red planet

Why are people so dang obsessed with Mars?

How viruses shape our world

The era of greyhound racing in the U.S. is coming to an end

See how people have imagined life on Mars through history

See how NASA’s new Mars rover will explore the red planet