Posted by Charlie, November 2016

One more step towards reducing the ‘European Bias’

A news and views article^1 in Nature this month discussed three publications (Mallick et al.^2, Pagani et al.^3, and Malaspinas et al.^4), which together describe 787 new, highly-quality genomes of individuals from a highly diverse set of over 280 global populations. On Repositive, we are now listing 698 of these genomes.

One of these datasets is the Simons Genome Diversity project (SGDP), of which we are listing 271 genomes. I have gone into detail about this project and the implications of this data in a previous blog post, which can be found here. However, I wanted to also take a quick look at the other study, the Estonian Biocentre Human Genome Diversity Panel (EGDP) whose data we are also listing on Repositive.

Estonian Biocentre Human Genome Diversity Panel

Of the 483 complete high coverage human genomes generate by Complete Genomics described in Pagani et al., we are listing 402. We are also listing a further 25 genomes from Clemente et al.^5

As with the SGDP, the EGDP has also gone to great lengths to sample populations from regions that are often underrepresented. Here they present data from 148 worldwide populations.

There is much debate over the path by which early humans dispersed from Africa. The first model proposes there was a ‘great migration’ occurring 40,000 – 80,000 years ago. The second model proposes multiple migrations, the first of which occurring 120,000 – 130,000 years ago. The main findings by Pagani et al., support the second model. Evidence from modern Papuan genomes shows that, though they predominantly derive from the main (75,000 year ago) expansion out of Africa, about 2% of their genetic signature is from an earlier migration. This ‘early migration’ genetic signature is likely to be from a largely extinct expansion out of Africa.

The value of this data

As with the SGDP, a huge proportion of this data is freely available for download, and therefore the potential for research application is highly diverse. Though this dataset has been applied to answer questions about human migratory events tens of thousands of years ago, the genotyping of the hugely diverse modern human populations is a very valuable endeavor.

As discussed in the same journal^6, and supported by our findings published by Frontline Genomics^7, currently most human genomes available to the research community are of individuals from European decent, with other ethnicities being hugely underrepresented. These two datasets are a step on the way to introducing other populations into the research environment. This data can go on to be ‘recycled’ by researchers in many different avenues of genomic research to answer fundamental questions, with reduced bias towards Europeans.


^1: Population genetics: A map of human wanderlust. Serena Tucci and Joshua M. Akey. Nature 538, 179–180 (13 October 2016) | doi:10.1038/nature19472

^2: The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Mallick et al., Nature 538, 201–206 (13 October 2016) | doi:10.1038/nature18964

^3: Genomic analyses inform on migration events during the peopling of Eurasia. Pagani et al., Nature 538, 238–242 (13 October 2016) | doi:10.1038/nature19792

^4: A genomic history of Aboriginal Australia. Malaspinas et al., Nature 538, 207–214 (13 October 2016) | doi:10.1038/nature18299

^5: A Selective Sweep on a Deleterious Mutation in CPT1A in Arctic Populations. Clemente et al. AJHG. Volume 95, Issue 5, p584–589, 6 November 2014 | doi:10.1016/j.ajhg.2014.09.016

^6: Genomics is failing on diversity. Alice B. Popejoy and Stephanie M. Fullerton. Nature 538, 161-164 (13 October 2016) Link

^7: Personal Genomics Open Access Datasets Even More European-Biased Than Scientific Literature? Richard Shaw and Manuel Corpas. Frontline Genomics. Link

Posted by
Charlie Whicher

Charlie Whicher

Product Manager
See all Charlie's posts