Genome-wide association studies have increasingly furthered our understanding of the molecular basis of many complex traits by finding, through genotyping and imputation, loci associated with many different traits. However, studies based on variants present in common chip platforms and imputation panels may not capture the variation that is geographically restricted and unique to specific populations. To advance our understanding of the genetics of a variety of traits in the Sardinian population, we are studying a sample of >6,000 individuals recruited from the population of a cluster of 4 small towns in Sardinia. Using whole genome sequencing, we sequenced DNAs from 2,120 Sardinian individuals enrolled either in this project or in a parallel project on autoimmune diseases, at an average depth of coverage of ~4X. We successfully identified and genotyped >17M single nucleotide polymorphisms (30.6% of them not in dbSNP135) with an error rate of 0.2% that is expected to decrease further by increasing the sample size (the estimated error rate was 0.5% and 0.3% in previous analyses of 505 and 1146, respectively). To increase the power to detect association, we are using the haplotypes generated by sequencing of these individuals to impute missing genotypes in the remaining ~6000 already genotyped with Immunochip and Metabochip. Strikingly, imputation using our Sardinian reference panel shows greatly increased accuracy when compared to an equal size reference panel of European haplotypes generated by the 1000 Genomes Project ( average imputation accuracy, rsqr=0.9 compared to 0.75 for alleles with frequency 1-3%). With a larger reference panel, imputation accuracy of variants with frequency 1-3% reaches 0.94 %, giving us the possibility of analyzing the rare frequency domain in the Sardinian populations. As an example of the advantages of analyzing population specific rare variation, we will discuss the Q39X mutation in the HBB gene, which is common in Sardinia (MAF ~5%) but very rare elsewhere. The variant is associated with a variety of blood phenotypes. For LDL cholesterol, the variance explained by this variant in Sardinia is higher than the variance explained by any of the variants previously found with standard GWAS analysis. Our approach thus increases the power of detecting population specific association.
Whole Genome Sequencing of 2100 Individuals in the founder Sardinian Population
C Sidore;S Sanna;M Pitzalis;M Zoledziewska;A Maschio;F Cucca;F Busonero
2012
Abstract
Genome-wide association studies have increasingly furthered our understanding of the molecular basis of many complex traits by finding, through genotyping and imputation, loci associated with many different traits. However, studies based on variants present in common chip platforms and imputation panels may not capture the variation that is geographically restricted and unique to specific populations. To advance our understanding of the genetics of a variety of traits in the Sardinian population, we are studying a sample of >6,000 individuals recruited from the population of a cluster of 4 small towns in Sardinia. Using whole genome sequencing, we sequenced DNAs from 2,120 Sardinian individuals enrolled either in this project or in a parallel project on autoimmune diseases, at an average depth of coverage of ~4X. We successfully identified and genotyped >17M single nucleotide polymorphisms (30.6% of them not in dbSNP135) with an error rate of 0.2% that is expected to decrease further by increasing the sample size (the estimated error rate was 0.5% and 0.3% in previous analyses of 505 and 1146, respectively). To increase the power to detect association, we are using the haplotypes generated by sequencing of these individuals to impute missing genotypes in the remaining ~6000 already genotyped with Immunochip and Metabochip. Strikingly, imputation using our Sardinian reference panel shows greatly increased accuracy when compared to an equal size reference panel of European haplotypes generated by the 1000 Genomes Project ( average imputation accuracy, rsqr=0.9 compared to 0.75 for alleles with frequency 1-3%). With a larger reference panel, imputation accuracy of variants with frequency 1-3% reaches 0.94 %, giving us the possibility of analyzing the rare frequency domain in the Sardinian populations. As an example of the advantages of analyzing population specific rare variation, we will discuss the Q39X mutation in the HBB gene, which is common in Sardinia (MAF ~5%) but very rare elsewhere. The variant is associated with a variety of blood phenotypes. For LDL cholesterol, the variance explained by this variant in Sardinia is higher than the variance explained by any of the variants previously found with standard GWAS analysis. Our approach thus increases the power of detecting population specific association.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.