How virus variants get their confusing names—and why that’s changing
Right now we're stuck with jumbles of letters and numbers, or country names that stigmatize people from that region. Experts announced a new plan to fix that.
Coronavirus variant names are strange and complicated. Sure, B.1.1.7 or P.1 might be perfectly fine names when virologists and microbiologists need to keep track of them—but they’re not so useful for the public trying to make sense of the variants driving new COVID-19 surges.
Take it from Salim Abdool Karim, an epidemiologist and former chair of South Africa’s COVID-19 advisory committee. He helped name the variant that was first discovered in the country: 501Y.V2, which, confusingly, is also known as B.1.351 and 20H/501Y.V2.
“Who wants to keep saying 501Y.V2?” Abdool Karim says. “501Y.V2 is such a mouthful to say. It’s a terrible name. You wouldn’t want to call your child 501Y.V2.”
Abdool Karim says it’s understandable that so many people have instead begun referring to the virus as “the South African variant.” But he is also one of many scientists who have criticized this practice, arguing that it is both stigmatizing and just plain inaccurate.
That’s why the World Health Organization has announced a new naming system for coronavirus variants that it hopes will make it easier for non-scientists to keep track of them. The international body has assigned letters of the Greek alphabet to each of the major variants that are driving surges around the world. The variant first documented in the United Kingdom, B.1.1.7, will now also be known simply as Alpha, while the variant that Abdool Karim named in South Africa will be called Beta. And so on.
But why was it necessary to rename these variants? Here’s a look at how viruses and their variants typically get their names, the chaotic ad hoc naming system that emerged during the pandemic, and the historic pitfalls of naming viruses after the place where they were identified.
Why names matter
Many viruses have been named for the geographic regions where they were first identified, such as the Zika Forest in Uganda or the Ebola River in the Democratic Republic of the Congo. But this has historically also been stigmatizing to the communities from which the viruses derive their names.
“We know from past outbreaks, epidemics, and naming scandals that these things can have a real impact because that might be the only thing someone knows about that country, that this bad thing is coming from there,” says Emma Hodcroft, a molecular epidemiologist at the University of Bern in Switzerland. “So there’s a real effort in the scientific community to try to avoid using geographical names.”
In 2015, the WHO even issued guidance for naming infectious diseases that discouraged using geographic locations, human names, or animal species. Last year, the body also deliberately avoided any reference to China or Wuhan when it named COVID-19, which stands for coronavirus disease 2019. (More on how SARS-CoV-2, the virus that causes COVID-19, got its name in a bit.)
But Alexandre “Sasha” White, assistant professor of the history of medicine and sociology at Johns Hopkins University, points out that this hasn’t stopped anti-Asian sentiment from rising in the last year—with some help from prominent figures like former United States President Donald Trump, who insisted on referring to SARS-CoV-2 as the “China virus” or “Wuhan virus.”
“I have no doubt that the associations between COVID-19 and China and the stigma around that has been unfortunately critical to the rise in anti-Asian hate crime around the world,” he says. This is not exactly a new phenomenon. The spread of infectious disease has been a powerful force for justifying racism and xenophobia for centuries.
But there’s also a scientific argument for staying away from geographical names: Scientists point out that the names are misleading at best and totally inaccurate at worst.
The truth is that scientists don’t know where the so-called South African variant actually originated. Sure, the variant was first identified in South Africa, but researchers haven’t yet found patient zero. It’s possible that South Africa was just the first country to find the variant because it was doing more genetic sequencing than other countries.
Abdool Karim also says the label is misleading because the variant has spread throughout the world and is now more prevalent in places like the United States than it is in South Africa. “So you can see how crazy it is to call it the South African variant,” he says.
There are real consequences of using an inaccurate name, such as the U.S. ban on travel from South Africa, Brazil, and the United Kingdom earlier this year. The effects can also be long-lasting. It’s been more than a century since the 1918 influenza pandemic devastated the globe and, even though the first cases were recorded in the U.S., Hodcroft points out that many people still believe it originated in Spain because it became widely known as the Spanish flu.
How a virus gets its name
Although the WHO is responsible for naming diseases, viruses are named by a group of virologists and phylogeneticists that serve on the International Committee on Taxonomy of Viruses (ICTV).
In February 2020, the ICTV re-christened what was then called the 2019 novel coronavirus as SARS-CoV-2, which stands for severe acute respiratory syndrome coronavirus 2. Stanley Perlman, a microbiologist at the University of Iowa and a member of the ICTV study group for coronaviruses, says the group chose the new name because the virus’s genetic make-up was “clearly close” to the one that caused the SARS outbreak in 2003, which is called SARS-CoV.
But given all the pathogens in the world, the ICTV only names viruses at the species level and higher. So the process for naming variants begins much more informally among scientists—and will vary from pathogen to pathogen, says Hodcroft.
“There’s no rulebook for how you name your pathogen,” she says. Scientists essentially come up with a name and see if it gets adopted by the scientific community or if another name takes root instead.
One typical way to classify a virus is by its antigens—a piece of the virus that provokes an immune response and whose mutations are particularly important.
Influenza A, for example, has two prominent antigens, known as H (which stands for hemagglutinin) and N (which stands for neuraminidase). Every time those antigens mutate, they get assigned a new number—hence the name H1N1 for the most infamous pandemic influenza subtype. The virus has 18 different H mutations and 11 different N mutations that can be mixed-and-matched to form 198 potential combinations—although only 131 subtypes have been identified in nature.
“All these viruses mutate all the time so we can’t be calling everything new names,” Abdool Karim says. “It’s only when they change an antigen that’s meaningful that we give it a new name.”
SARS-CoV-2, the virus that causes COVID-19, is mutating particularly rapidly and in so many ways both benign and dangerous—which Perlman says requires “a really intricate system of naming.” The trouble is that scientists have essentially had to do that on the fly—and have come up with several different systems, each with a different use.
The chaos of SARS-CoV-2 variants
In November 2020, researchers in South Africa sequenced a new and more transmissible SARS-CoV-2 variant, which included an N501Y mutation that allowed the spike protein to bind more tightly to human cells. This mutation replaces the asparagine (N) amino acid, typically found at position 501 of the spike protein, with tyrosine (Y). But before they could announce it to the public, the researchers first needed to figure out a name.
“We just sat down over a cup of tea and called it 501Y.V2,” Abdool Karim says. The first part of the name represents the most meaningful mutation of the virus, while V2 simply signifies that it is the second variant identified with that particular mutation. (The variant that was discovered in the U.K. is 501Y.V1 and the variant discovered in Brazil is 501Y.V3.)
But that’s not the variant’s only name. Several naming systems have arisen since the beginning of the pandemic—the two most prominent being Nextstrain and Pango. Although having more than one variant classification system might seem like overkill, these offer scientists different ways of analyzing the SARS-CoV-2 family tree.
Hodcroft says that the Nextstrain system, which she helped develop, is intended for scientists who want to look at the broader patterns on the virus’s family tree by assigning names to major genetic groupings, or clades, of the virus. It uses simple names that are based on the year the clade was identified, followed by a letter that’s assigned in alphabetical order. The root clade in the system is 19A, representing the viruses that were prevalent in China at the beginning of the outbreak.
However, Hodcroft says the limitations of the Nextstrain naming system became apparent when variants like 501Y.V2 began to drive regional outbreaks. Although they were technically not yet widespread enough to merit their own clade, she says these variants clearly needed to be identifiable. As a result, in this system, the variant of concern identified in South Africa is now named 20H/501Y.V2.
“It’s just because there’s no system for this,” Abdool Karim says. “It’s made as we go along. As we learn more, we change it."
Pango, meanwhile, takes a fine-grain approach to the SARS-CoV-2 family tree and has become the most commonly used system since it’s useful for tracking local outbreaks. There are hundreds of lineages in this system, which is designed to reflect how the virus has evolved amid each new outbreak. It assigns new lineages not just based on significant mutations, but also includes other epidemiological events, such as if the virus jumped from one location to another.
“The fundamental principle is that the lineage names represent ancestry and descent,” says Oliver Pybus, an evolutionary biologist at the University of Oxford who helped design Pango.
Pybus says that every Pango lineage can be read essentially as a family tree. The earliest viruses that first circulated in China are denoted as lineages A or B. As they evolved and spread across the globe, their descendants are marked by a series of numbers. For example, B.1 includes the outbreak in northern Italy in early 2020 and is the first descendant of the B lineage to be named. Meanwhile the variant of concern identified in South Africa, named B.1.351, is the 351st descendant of the virus that caused that Italian outbreak.
To keep these names from becoming too unwieldy, each Pango lineage can only have up to three dots in it. If the virus changes significantly after that, a new lineage begins under a different letter of the alphabet. That’s why the variant that was first identified in Brazil is called P.2 even though it is a descendant of the B.1.1.28 lineage.
Still confused? That’s because these naming systems are designed not to be easy to recall but to give scientists a common language in which they can discuss and investigate the evolution of SARS-CoV-2.
“As scientists we’re pretty used to these kinds of complicated names,” Hodcroft says. “We love to divide things up and name them.”
Virus variants typically don’t make national news. But now that some of these variants are driving the pandemic and dominating headlines, Hodcroft says there needs to be a way for non-scientists to keep track of them, too—and, ideally, not by using their geographic names.
Developing a new naming system
For all of these reasons, the WHO stepped in to develop yet another naming system for the most worrisome virus variants. It convened a panel of virologists and scientists with expertise in naming microbes, tasking them to come up with what it describes as “easy-to-pronounce and non-stigmatizing labels.” The group recommended using letters of the Greek alphabet.
Abdool Karim, who was consulted on the new system, said in an interview prior to the announcement that it was a welcome departure from the practice of using a jumble of letters and numbers. “I thought it was quite good,” he says.
Rather than renaming every mutation of the virus, the WHO system applies to the four variants of concern, which are variants that scientists have deemed to be more virulent, transmissible, or are associated with a decrease in the effectiveness of vaccines or therapeutics. In addition to the Alpha and Beta labels for the variants respectively discovered in the U.K. and South Africa, the variant first discovered in Brazil is now also known as Gamma and the variant first found in India has been labeled Delta.
The new system has also assigned letters of the Greek alphabet to six variants of interest, which have been associated with local case clusters or have been detected in multiple countries. These variants are labeled Epsilon, Zeta, Eta, Theta, Iota, and Kappa.
Although scientists will go on using their naming systems like Nextstrain and Pango, the WHO hopes its new system will make it easier for the public to keep track of the virus mutations that are threatening their communities. In a statement, it encouraged governments, media outlets, and others to adopt the new labels.
The challenge now, however, will be to get the public to actually take it up in place of the geographic variant names. In an interview conducted before the announcement of the naming system, Hodcroft said that the way the WHO came up with the new variant naming system might help: If the body could bring a group of virologists together and get them to agree to use these names whenever they speak to the public, there’s a much better chance that the scientific community and the rest of the world will adopt the new system.
Either way, Abdool Karim says scientists have learned an important lesson for the inevitable next pandemic. “We’re learning that we need to have in place a name system early,” Abdool Karim says. “I think we’ll be proactive next time.”
Editor's note: This story has been updated with the announcement of the new WHO virus variant naming system. It was originally published on April 20, 2021.