Featured
University of California
Santa Cruz, US
Population assisted genome inference
The human reference genome has transformed human genetics by providing a proxy to a universal coordinate system. However, the reference is but one genome, and as such can not contain all the variations present in the population. Analysis relative to it creates a so called reference allele bias. When identifying the variations within a new sample by mapping against the reference it is easy to find alleles within the reference but harder to near impossible to find the alleles not contained within it. Adding additional variations to the reference genome naturally defines a graph structure, a genome graph, with the intersections between additional sequences defining vertices that connect myriad possible human genomes. This subtle extension opens numerous possibilities and forces us to redefine many basic concepts that the field has taken for granted. I will layout our theoretical and empirical investigations of these issues, and show our progress towards a holy grail: comprehensive genome inference conditioned on not a single genome but a population.
Benedict Paten is assistant director of the Center for Big Data in Translational Genomics at UC Santa Cruz.
Benedict Paten directs the UC Santa Cruz Center for Big Data in Translation Genomics, established to develop standard, globally accepted Internet protocols for efficiently handling genomic data and ultimately extend them to clinical practice. Benedict focuses on all aspects of genome comparison, both between and within species. He is passionate about the reconstruction of our shared evolutionary history, and particular the evolutionary history of vertebrates. He was a principal organizer of the Assemblathon and Alignathon competitions designed to improve the state of the art in genome assembly and alignment. He cochairs a task team of the data working group of the Global Alliance for Genomics and Health that is establishing new standards for the representation of genome variation.