In 2003, the Human Genome Project (HGP) successfully mapped a large portion of the human genome. Since that time, the HGP’s genomic map — a linear sequence of the four DNA bases — has served as a single reference genome for all novel sequencing data. But while immensely valuable, the HGP’s reference genome does not account for all genomic variation, making it inadequate for representing humanity as a whole, which encompasses many and complicated genetic variants.
“In the decade since the HGP announced the completion of a major portion of their work, the vast improvement in our understanding of the complexity of the genome, the rapid improvement of technology for sequencing genomes and the increasingly broad application of this technology have created a need to rethink how scientists describe to one another the rich patterns of genomic variations uncovered by cutting-edge experiments,” says Nick Carriero, group leader for software development at the Simons Center for Data Analysis. “Given that study of variation is at the heart of most medical and life sciences genome-based research, addressing this challenge is critical to advancing these fields.”
To address this problem, the Simons Foundation awarded a grant to a team of researchers led by David Haussler and Benedict Paten at the University of California, Santa Cruz (UCSC) to develop a graph-based human reference genome structure built from a diverse sampling of the human population. Termed the Human Genome Variation Map, this new resource will build upon the HGP’s reference genome, enhancing it by representing genetic variation as alternate paths along the genome. The result will include all forms of human genomic variation and offer a more complete resource for understanding novel sequencing data.
The Human Genome Variation Map will be based on some 300 complete human genome sequences representing a diverse array of ethnicities, a dataset amassed by David Reich and Nick Patterson at the Massachusetts Institute of Technology and Harvard University and funded by the Simons Foundation. Paired with the HGP’s reference genome, the diverse set of human genomes will allow Haussler and Paten to create a pilot map of human genome variation. From there, the researchers plan to add many more genomes to account for even more variation.
“Building on their extensive bioinformatics experience — starting with their core role in the HGP itself — the team at UCSC is well positioned to ensure compatibility of this framework with tools that are part of the everyday work environment of almost all bioinformaticists,” says Carriero.
Initial work toward creating a standard data model for the map has begun, and by the end of the year the researchers hope to create a draft of the pilot Human Genome Variation Map, based on the full set of diverse genomes from Reich and Patterson. In addition, Haussler and Paten outlined the next steps to enhance the pilot map to include more genomes and create a final, comprehensive Human Genome Variation Map.
Eventually, Haussler and Paten aim to provide the Human Genome Variation Map together with open-source software tools for researchers to search and analyze the data, capabilities they hope will shed light on genome variants that contribute to a range of conditions, from autism to diabetes. The researchers hope that this new resource, representing information at the core of medical and life sciences research, may lead to significant scientific and medical breakthroughs.