An approach known as Genome-wide association study (GWAS) have signed a new era in the Genetics research field around ten years ago, shedding light on the genetic components underlying complex traits and diseases, previously largely unknown. Statistical inferential methods were key ingredients for success, allowing researchers to incorporate external data in their studies, hence maximizing information at no additional experimental cost. Technology has continued to improve, and while initially <1 million points of the DNA (genetic variants) were assessable in a person, nowadays the entire genome (3 billion points) can be characterized with next-generation sequencing machines. The cost of sequencing is still impractical for GWASs, because several thousands of individuals are needed to assure reproducible findings. With statistical methods however, full genomes can be inferred if a reduced number of genetic variants is characterized on the study's volunteers and a reference set of independent genomes is available. An international effort, the 1000 Genomes Project, has generated public reference sets by sequencing ~2500 representatives of the world's populations. In this thesis, we evaluated the benefits of a population-specific reference set for Sardinians by sequencing 2,120 volunteers and subsequently incorporate it in GWASs. We show how the accuracy of inferred genomes is improved compared to using the 1000 Genomes set, and we identified novel genetic components for several complex traits that could not have been discovered otherwise. Similar efforts are ongoing in other populations, including the Dutch, and we discuss in this thesis their design and results.

Enhancing genetic discoveries with population-specific reference panels

Serena Sanna
2016

Abstract

An approach known as Genome-wide association study (GWAS) have signed a new era in the Genetics research field around ten years ago, shedding light on the genetic components underlying complex traits and diseases, previously largely unknown. Statistical inferential methods were key ingredients for success, allowing researchers to incorporate external data in their studies, hence maximizing information at no additional experimental cost. Technology has continued to improve, and while initially <1 million points of the DNA (genetic variants) were assessable in a person, nowadays the entire genome (3 billion points) can be characterized with next-generation sequencing machines. The cost of sequencing is still impractical for GWASs, because several thousands of individuals are needed to assure reproducible findings. With statistical methods however, full genomes can be inferred if a reduced number of genetic variants is characterized on the study's volunteers and a reference set of independent genomes is available. An international effort, the 1000 Genomes Project, has generated public reference sets by sequencing ~2500 representatives of the world's populations. In this thesis, we evaluated the benefits of a population-specific reference set for Sardinians by sequencing 2,120 volunteers and subsequently incorporate it in GWASs. We show how the accuracy of inferred genomes is improved compared to using the 1000 Genomes set, and we identified novel genetic components for several complex traits that could not have been discovered otherwise. Similar efforts are ongoing in other populations, including the Dutch, and we discuss in this thesis their design and results.
2016
Istituto di Ricerca Genetica e Biomedica - IRGB
978-90-367-8821-2
genome-wide association study
genotype imputation
isolated populations
genetics of complex traits
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/423803
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact