It could be a parent, sibling, family member or friend. It could be you. Almost everyone knows someone who has or has had cancer. The statistics are stark: One in three American males are likely to develop the disease. Cancer is impenetrably complex and notoriously difficult to treat; it is one of the leading causes of death in the world. Through persistent global efforts, cancer fatality rates in the US have fallen by 23 percent, but the number of new cases is predicted to rise by 62 percent by 2040. We’re in a running battle with cancer, but we may be edging ahead thanks to a powerful new weapon: big data.
Big data really is big. It describes data sets that are so impossibly vast and complex that conventional computing cannot cope and whole new technologies have had to be developed to collect, manage, and analyze them. It’s fast becoming a $200 billion industry that is making a difference, because buried within the 2.5 quintillion bytes of data we generate every day is valuable information. Meticulously analyzing this can reveal hidden patterns, connections, and insights that could be used to improve everything from a company’s profit margins to a person’s chances of surviving disease—including cancer.
One way that big data is helping to fight the big C is through prompt diagnosis, as catching cancer early can greatly increase the prospects of a positive outcome. With human genome sequencing for as little as $2,000, we are becoming able to analyze an individual’s DNA for some genetic biomarkers of cancer. Similarly, as more and more medical records are digitized they can be scanned for symptoms suggestive of cancer, prompting diagnostic tests. Big data can also reveal global trends, identifying groups at particular risk or finding hidden links that can be investigated as a cause or a cure. Analysis of a vast range of variables led to the revelation that desipramine, a commonly used antidepressant, could potentially be an effective treatment for small-cell lung cancer.
Identifying the genetic mutations responsible for a type of cancer can guide us toward more effective treatments. However, as the disease is constantly changing, sequencing a cancer genome generates huge amounts of information. Analyzing this can reveal the mutations most responsible for producing tumor cells, enabling us to develop drugs that will target and kill these critical mutations. With collaboration, such research could be developed into a global database with the power to identify particular tumors and predict their behavior, further increasing the speed and efficacy of treatment.
Millions of people have been treated for cancer; some treatments have worked better than others. Each cancer patient can generate at least a terabyte of digital data. The vast variables of demographic, lifestyle, and medical history, coupled with the specifics of an individual’s cancer and care, make up a big data treasure trove that is ripe for analysis. There are various projects actively collating all the fragmented information we have on cancer patients, making it available for study. Connecting doctors with data will enable them to compare treatment plans and recovery rates for similar cases, tailoring a patient’s treatment to those with the greatest success—and that can help save lives.
Big data is poised to fight disease on another front: helping to manage and understand outbreaks of infectious disease from seasonal flu epidemics to the spread of dengue fever. One idea is to analyze information gathered from an assortment of digital sources to spot behavior patterns that might indicate the start of an epidemic. A local spike in online searches for a particular disease or its symptoms could indicate an outbreak, while social media can be monitored for references to an illness (helping to track its movement through a city, country or continent). With nearly four billion people using the internet, it offers a rich reservoir of intelligence with huge potential.
To date the method has had varied success. Google Flu Trends’ groundbreaking achievements in tracking seasonal flu were stymied by the H1N1 virus stimulating global searches that skewed local data, gave false results, and contributed to its closure. However, in Brazil data from Twitter has been successfully used to map the spread of dengue fever and has even helped predict its passage to certain cities. Similarly, search info from Google and Twitter predicted the spread of Zika virus in Latin America weeks ahead of formal declarations. Such sources still need refinement, but they could provide all-important leads for experts on the ground to investigate.
Here, big data also adds value. As more countries digitize their health records, these can be monitored for trends ranging from outbreaks of disease to their resistance to antibiotics. Already many health care professionals report infectious diseases to national databases, and an international network of volunteers provides regular updates on illnesses. Even more valuable could be the six billion smartphones expected to be in use by 2020, allowing us to ask more people in more countries direct questions about their health. Analyzing all of this can provide the timely warnings needed to not only keep up with an epidemic, but perhaps get a step ahead.
Big data has proven enormously beneficial for health care. Notonly can it reduce spiraling costs through driving efficiencies and decision making, it is also helping us to avoid preventable diseases, improve quality of life, manage epidemics, and even, perhaps, win the fight against cancer. As we move rapidly toward a digital universe of 40 zettabytes of data, it’s possible that the beginning of the solution to some of the world’s worst diseases may lie at our fingertips—we just have to claw our way through the intelligence.
We asked some big questions about living a better life. Discover more about how we can overcome the world’s biggest challenges at natgeo.com/questionsforabetterlife