Our research is focused on the isolated population of several villages of Ogliastra, a region of eastern-central Sardinia, because it presents extremely advantageous characteristics for complex-trait studies. Our approach is family-based and the person is the key, but we must confront some daunting problems in our effort to fully use the data. The existence and accessibility of both ancient and recent archives (i.e. municipality registers, church archives, personal interviews) lets us reconstruct, in principle, the complete genealogy of the entire population of each village. This is an important prerequisite to build large genealogies which connect many selected individuals, chosen according to their phenotypes. However, we faced reconstruction difficulties because archives are often hand-written and contain incomplete data, sometimes written in different languages, like Latin, Spanish, and Italian. We needed something intuitive yet powerful to allow us to fully use this unprecedented wealth of genealogical data. Thus, we created PedNavigator to easily represent deep-rooted complex pedigrees with which the researcher can interact and better explore links between individuals to find even distant relationships. This intuitive informatics tool allows us to reconstruct, with an excellent degree of accuracy, genealogies of the villages since the early 1600's. What's more, our genealogies have been cross-validated with the study of mitochondrial DNA and the Y chromosome, identifying also ancient founders and their progeny. Thanks to the completeness of the genealogies, we can calculate the kinship and find the common ancestors of virtually any pair of people in the database. For the study of complex-traits, a database must be more than just a collection of family links: each individual's personal data have been associated with other useful information, like medical and medication history, lifestyle, risk factors, genetic data, quantitative traits and qualitative phenotypes, etc. To collect phenotypical information, we have created a specific electronic clinical record based on the structured interview drawn up by the study group. Such a record is composed of several modules and can gather many different kinds of data, measurements and biological samples: serologic and hemogram parameters, anthropometrical measurements, socio-demographic data, living habits, exposure to most common risk factors, personal history of pathologies (via the ICD-IX classification) and prescriptions (via the ATC classification). As for genetic data, information on genotype sessions contains microsatellite markers and SNPs, letting us perform statistical analyses using single markers, entire genotypes, or even haplotypes. We have also developed a set of software tools to optimize laboratory activities and data flow, such as direct data import from DNA sequencers and automatic procedures to identify and remove mendelian errors in genotypes. To consult this huge amount of heterogeneous information, we have created a web-based framework with a common user interface that centralizes access to data-entry programs, query builders and data analysis applications. Furthermore, we have enriched PedNavigator with the possibility of viewing simultaneously genotypes and phenotypes taken directly from the database. Currently, on our technology platform, we have collected genealogical information on more than 72,000 persons, with more than 65,000 familiar links, and more than 4,500 clinical and genetic samples, from the three villages of Ogliastra hitherto studied. Starting from an individual or a pedigree, researchers can interactively query the database consulting patient records or laboratory results, eventually collecting data to create input files for the subsequent statistical analysis. To further improve usability, even for novices, we have developed a web interface application, called Boomerang, to manage jobs such as linkage analysis programs, like Simwalk, Merlin, and GeneHunter, on a Linux cluster. The framework and its tools were developed in the Java Programming Language, and they run as Web Applications into Apache Tomcat, using the Oracle Database Server.

Analysis of a isolated population multidisciplinary database through interactive informatics tools

G Biino;A Angius;M Pirastu
2005

Abstract

Our research is focused on the isolated population of several villages of Ogliastra, a region of eastern-central Sardinia, because it presents extremely advantageous characteristics for complex-trait studies. Our approach is family-based and the person is the key, but we must confront some daunting problems in our effort to fully use the data. The existence and accessibility of both ancient and recent archives (i.e. municipality registers, church archives, personal interviews) lets us reconstruct, in principle, the complete genealogy of the entire population of each village. This is an important prerequisite to build large genealogies which connect many selected individuals, chosen according to their phenotypes. However, we faced reconstruction difficulties because archives are often hand-written and contain incomplete data, sometimes written in different languages, like Latin, Spanish, and Italian. We needed something intuitive yet powerful to allow us to fully use this unprecedented wealth of genealogical data. Thus, we created PedNavigator to easily represent deep-rooted complex pedigrees with which the researcher can interact and better explore links between individuals to find even distant relationships. This intuitive informatics tool allows us to reconstruct, with an excellent degree of accuracy, genealogies of the villages since the early 1600's. What's more, our genealogies have been cross-validated with the study of mitochondrial DNA and the Y chromosome, identifying also ancient founders and their progeny. Thanks to the completeness of the genealogies, we can calculate the kinship and find the common ancestors of virtually any pair of people in the database. For the study of complex-traits, a database must be more than just a collection of family links: each individual's personal data have been associated with other useful information, like medical and medication history, lifestyle, risk factors, genetic data, quantitative traits and qualitative phenotypes, etc. To collect phenotypical information, we have created a specific electronic clinical record based on the structured interview drawn up by the study group. Such a record is composed of several modules and can gather many different kinds of data, measurements and biological samples: serologic and hemogram parameters, anthropometrical measurements, socio-demographic data, living habits, exposure to most common risk factors, personal history of pathologies (via the ICD-IX classification) and prescriptions (via the ATC classification). As for genetic data, information on genotype sessions contains microsatellite markers and SNPs, letting us perform statistical analyses using single markers, entire genotypes, or even haplotypes. We have also developed a set of software tools to optimize laboratory activities and data flow, such as direct data import from DNA sequencers and automatic procedures to identify and remove mendelian errors in genotypes. To consult this huge amount of heterogeneous information, we have created a web-based framework with a common user interface that centralizes access to data-entry programs, query builders and data analysis applications. Furthermore, we have enriched PedNavigator with the possibility of viewing simultaneously genotypes and phenotypes taken directly from the database. Currently, on our technology platform, we have collected genealogical information on more than 72,000 persons, with more than 65,000 familiar links, and more than 4,500 clinical and genetic samples, from the three villages of Ogliastra hitherto studied. Starting from an individual or a pedigree, researchers can interactively query the database consulting patient records or laboratory results, eventually collecting data to create input files for the subsequent statistical analysis. To further improve usability, even for novices, we have developed a web interface application, called Boomerang, to manage jobs such as linkage analysis programs, like Simwalk, Merlin, and GeneHunter, on a Linux cluster. The framework and its tools were developed in the Java Programming Language, and they run as Web Applications into Apache Tomcat, using the Oracle Database Server.
2005
GENETICA DELLE POPOLAZIONI
Istituto di Ricerca Genetica e Biomedica - IRGB
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/107409
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact