Within a drop of blood, you can find all the information you need to reasonably guess where a person came from, without ever having to look at their face, name or passport. Small variations in our DNA are enough for the task. They can be used to pinpoint someone’s place of origin to a remarkable degree of accuracy, often to within a few hundred kilometres.
The new discovery comes from a team of Swiss and American researchers led by John Novembre at UCLA, who wanted to understand how the human genome varies on a continental scale. To that end, they looked at the genomes of over 1.300 people sampled from almost three dozen countries across Europe. The sample was originally collected by GlaxoSmithKline to hunt out genetic variations that influence the effectiveness of drugs and their side effects, but Novembre’s team put it to use in understanding the links between genes and geography instead.
They analysed at single-letter differences in DNA (“single nucleotide polymorphisms” or SNPs) at about 200,000 places in each of the genomes. They compared this data to each person’s country of origin as well as that of their grandparents if possible.
To work with this massive collection of information, Novembre applied a mathematical technique called principal component analysis (PCA) to transform the unwieldy set of data into a more manageable form. The technique looked for underlying patterns in the massive collection of SNPs and boiled them all down to just two variables, known as principal components. The upshot is that each person could be plotted as a point on a simple two-dimensional graph, whose axes correspond to the two principal components. It collapsed a complicated cloud of data into a simple sheet.
The result was startling – the genetic and geopolitical maps of Europe overlap to a remarkable degree. On the two-dimensional genetic map, you can make out Italy’s boot and the Iberian peninsula where Spain and Portugal sit. The Scandinavian countries appear in the right order and in the south-east, Cyprus sits distinctly off the “coast” of Greece.
Zoom in closer, and the map even reveals distinct genetic cluster within Switzerland based on the language people speak. German-speaking Swiss cluster to the east, Italian speakers to the south and Francophones to the west. Even so, the clusters overlap and in general, the data reveals a genetic continuum between Europeans, where the borders of the genetic map are fuzzier than those of its geographical counterpart. As far as genes are concerned, the closer together two people live, the more similar their DNA is.
There were a few exceptions to the genetic map’s accuracy, with a few countries appearing in odd positions. Slovakia, for example, turns up in the middle of Italy rather than next to the Czech Republic where it belongs. Russia too is further west than its actual position and appears to be hugging Poland (which I find ironically unsettling in the light of recent political events). But Novembre says that both exceptions are probably due to small sample sizes – “Russia” in this case was only represented by six people, and just one poor individual was waving the flag for Slovakia.
Exceptions aside, the overlay between the two maps is startlingly accurate. Using only genetic information, Novembre’s team can place over 90% of people within 700km of their place of origin, and over 50% of people within 310km. The graph below shows the different degrees of accuracy for different countries.
The results have implications for a lot of biomedical research. Many scientists are scanning entire genomes on a hunt for SNPs that affect a person’s risk of diseases like cancer or their reaction to drugs. Novembre says that researchers who are running these “whole-genome studies” need to bear in mind where their sample has come from. Even if a study looks at a small and seemingly related parts of Europe, it would have to adjust for any geographical influences in the genetic variations it uncovers.
This study is just the beginning. At the moment, the analysis is too crude to detect rare genetic variants that are the result of new mutations. These tend to cluster around the place where the mutation first sprang into being, and as such, they can give us more information about the structure of populations on an even finer scale. As more and more genomes are sequenced and statistical methods improve, the genetic map will become clearer and clearer.
Reference: John Novembre, Toby Johnson, Katarzyna Bryc, Zoltán Kutalik, Adam R. Boyko, Adam Auton, Amit Indap, Karen S. King, Sven Bergmann, Matthew R. Nelson, Matthew Stephens, Carlos D. Bustamante (2008). Genes mirror geography within Europe Nature DOI: 10.1038/nature07331