Posted by Charlie, November 2016

The Value of Personal Data

The next collection of human genomic data that we are launching onto the Repositive platform is 'Personal genome data'. This is data from individuals like you and me who have decided to share their genomic data with the public.

What is this 'personal genome' thing?

These people are predominantly healthy individuals who have had their DNA tested by consumer genetics companies such as 23andMe and Ancestry.com for interests sake. However, some of the samples are from people who have a specific disease or heriditary trait that has lead to them wanting to get their genome sequenced.

These induviduals have then decided to upload the raw data online so others can use their data. People get their genomes sequenced and share their personal data for various reasons; something I will go into in more depth in a future blog post. However, the overarching reason is that people feel that their data could help further research and therefore benefit humanity as a whole.

"The few individuals who have bought a 23andMe test are the futurists, the enthusiasts, the errant genetic genealogists, and those seeking a sense of ‘control’ in their lives." Craig Macpherson, editor and founder of DNA Testing Choice

Read Craig's guest post on the Repositve blog: Personal genomic data for the masses

How will it further research and benefit humanity?

Genetics researchers not only need samples from pathological sources, but they also require healthy control data to compare those diseased samples to. This is where personal genomes come in.

"Open data is a critical component of the scientific method, but genomes are both identifiable and predictive. As a result, many studies choose to withhold data from participants and restrict access to researchers.^1"

Therefore, there is a huge amount of value in the relatively untapped resource of 'personal genomes'. Especially because this data is from ethnically and demographically diverse populations, and is often accompanied by a lot of interesting phenotypic data (such as age, allergies, eye colour) and lifestyle data (such as smoking, coffee consumption and general interests). See OpenSNP Phenotypes for more examples.

This means that these samples can be used by many different researchers, in many different fields to answer many different types of questions.

It is so valuable in fact, that companies are now starting to talk about paying people for their genetic data. Genos recently released a plan to compensate people for their DNA so they can "give researchers a crowdsourced genetic map to help with disease discovery."

Personal genome data on Repositive

On Repositive, we have a feature which enables users to register data on the platform - we had envisaged this could be unpublished or siloed data that people had within their labs. By registering the data they increase its visibilty and help link people to it. We built this feature because we wanted to help the community share and 'advertise' the data they have and find collaborators. However, one unexpected use case of 'Data Registration' is that people have started to register their own personal genome data.

For example, KT Pickard registered his Free Illumina 30x WGS dataset, and Albert Vilella registered his Personal Genome. To see more check out our Registered Data on Repositive.

Over 1K personal genomes

Repositive is now indexing data from over 1000 personal genomes on our data discovery platform.

Aside from the registered data mentioned above, this includes:

Having already discussed Steven Keating's data and The Corpasome in a previous blog post - I won't bore you again.

View the Personal Genome Data collection

Mike Lin's Genome

Mike Lin is a software engineer based in Silicon Valley who works for DNAnexus. In 2013, Mike joined Illumina's Understand Your Genome project to get his personal genome sequenced. You can read more about his jouney in a series of blog posts he has publised. This data is stored in DNAnexus and therefore you will have to create a free account to access the data.

The Personal Genome Project


The PGP is based in four countries around the world (USA, Canada, UK and Austria).

"Working to generate, aggregate and interpret human biological and trait data on an unprecedented scale using open-source, open-access and open-consent frameworks."

They are doing this by collaborating with individuals who are willing to share their data publically online, and encourage widespread use of this public data resource as a platform for scientific research, education and improvement of the public health.

Genomes Unzipped

genomesunzipped Genomes Unzipped is predominantly a collaborative online project about personal genomics:

"Our goal is to provide genetic testing consumers with independent and informed analysis of developments in the field of genetics and the genetic testing industry."

However, alongside being informative, the members of this community (16 so far) have taken commercial genetic tests and have made the raw data publicly available for others to download, analyse and reuse.

Where next

This is only the beginning of a rapidly expanding field. Theoretically, personal genomics will only end when every individual on the planet is sequenced. But even then, there are more ways in which personal genomics can benfit science - such as getting one's microbiome (https://blog.repositive.io/moving-to-human-microbiome-data/) sequenced.

As more and more people get their DNA sequenced, more and more opportunities present themselves to entrepreneurs and researchers. Repositive will continue to index more data in the near future, including data from sources such as OpenSNP - keep your eyes open!


^1: Sharing Personal Genomes - from the PGP project

Related Blog Posts

Personal genomic data for the masses

Posted by
Charlie Whicher

Charlie Whicher

Product Manager
See all Charlie's posts