A news and views article^1 in Nature this month discussed three publications (Mallick et al.^2, Pagani et al.^3, and Malaspinas et al.^4), which together describe 787 new, highly-quality genomes of individuals from a highly diverse set of over 280 global populations. On Repositive, we are now listing 698 of these genomes.
One of these datasets is the Simons Genome Diversity project (SGDP), of which we are listing 271 genomes. I have gone into detail about this project and the implications of this data in a previous blog post, which can be found here. However, I wanted to also take a quick look at the other study, the Estonian Biocentre Human Genome Diversity Panel (EGDP) whose data we are also listing on Repositive.
Estonian Biocentre Human Genome Diversity Panel
As with the SGDP, the EGDP has also gone to great lengths to sample populations from regions that are often underrepresented. Here they present data from 148 worldwide populations.
There is much debate over the path by which early humans dispersed from Africa. The first model proposes there was a ‘great migration’ occurring 40,000 – 80,000 years ago. The second model proposes multiple migrations, the first of which occurring 120,000 – 130,000 years ago. The main findings by Pagani et al., support the second model. Evidence from modern Papuan genomes shows that, though they predominantly derive from the main (75,000 year ago) expansion out of Africa, about 2% of their genetic signature is from an earlier migration. This ‘early migration’ genetic signature is likely to be from a largely extinct expansion out of Africa.
The value of this data
As with the SGDP, a huge proportion of this data is freely available for download, and therefore the potential for research application is highly diverse. Though this dataset has been applied to answer questions about human migratory events tens of thousands of years ago, the genotyping of the hugely diverse modern human populations is a very valuable endeavor.
As discussed in the same journal^6, and supported by our findings published by Frontline Genomics^7, currently most human genomes available to the research community are of individuals from European decent, with other ethnicities being hugely underrepresented. These two datasets are a step on the way to introducing other populations into the research environment. This data can go on to be ‘recycled’ by researchers in many different avenues of genomic research to answer fundamental questions, with reduced bias towards Europeans.