When COVID-19 began tearing around the globe, Francesca Dominici suspected air pollution was increasing the death toll. It was the logical conclusion of everything scientists knew about dirty air and everything they were learning about the novel coronavirus. People in polluted places are more likely to have chronic illnesses, and such patients are the most vulnerable to COVID-19. What’s more, air pollution can weaken the immune system and inflame the airways, leaving the body less able to fight off a respiratory virus.
Many experts saw the possible connection, but Dominici, a biostatistics professor at the Harvard T.H. Chan School of Public Health, was especially well equipped to test it. She and her colleagues have spent years creating an extraordinary data platform, one that aligns information on the health of tens of millions of Americans with a day-by-day summary of the air they’ve been breathing since 2000. Dominici explained it to me last summer on a video call from her home in Cambridge, Massachusetts. Her pandemic puppy, a black Lab, squirmed on her lap. In London, where I sat in my home office, the brief respite in traffic provided by the initial lockdown had ended, and diesel fumes once again clouded the air.
Every year, Dominici told me, she purchases granular (but anonymized) information on each of the roughly 60 million older Americans enrolled in Medicare—age, gender, race, zip code, and the dates and diagnostic codes for all deaths and hospitalizations. That’s half the data platform. The other half is an achievement in itself. Led by Dominici and Harvard epidemiologist Joel Schwartz, dozens of scientists first divided the United States into a grid of one-kilometer-wide (.62-mile) squares. Then they trained a machine learning program to calculate daily pollutant levels, over 17 years, in each square—even if it didn’t have a pollution monitor in it.