INTRODUCTION: Mitochondria are sub-cellular organelles essential for the maintenance of cellular physiology of most of eukaryotic organisms. Besides their central role in the energetic metabolism of respiring cells they are also involved in some key steps of other important metabolic pathways such as heme and hormone biosynthesis. Furthermore, numerous studies have demonstrated a key role of mitochondria in apoptosis, aging and in a number of different human diseases, including Parkinsons, diabetes mellitus and Alzheimers. Despite their importance in the cell life maintenance, about 95% of proteins contributing to mitochondrial biogenesis and functional activities are nuclear encoded and hence all functions of mitochondria depend on the interaction of nuclear and organelle genomes. In order to provide a specialized resource for functional and comparative genomics studies supporting research on basic science of mitochondria and mitochondrial pathogenesis we developed MitoNuc (1), which is a comprehensive collection of nuclear genes encoding for mitochondrial proteins of metazoan species consolidating information from the most accredited public databases. MATERIALS AND METHODS: MitoNuc is a relational database implemented in the Database Management System (DBMS) MySQL. The database has been designed and developed to provide comprehensive data on nuclear genes and encoded proteins targeted to the mitochondrion. Data are extracted from external database and include: gene sequence, structure and information from ENSEMBL [2], protein sequence and information from SWISSPROT [3], transcript sequence and structure from RefSeq [4] and UTRdb [5], disease information from OMIM [6]. The database is automatically annotated thanks to the development of BioPerl scripts able to extract data from the external databases. Control procedures, including BLAST check against protein sequences reported in the correlated public data resources (SWISSPROT and ENSEMBL) have been implemented to avoid data inconsistencies. The database is available in an EMBL-like flat file format and retrievable through the SRS Retrieval System at the following address http://www.ba.itb.cnr.it/srs/ . Furthermore MitoNuc can be queried in a more user-friendly interface developed in PHP and BioPerl at the following address: http://www.ba.itb.cnr.it/MitoNuc/. RESULTS: Each database entry consists of a nuclear gene coding for a mitochondrial protein in a given species, and reports information on: species name and taxonomic classification; gene name, functional product, sub-mitochondrial localization, protein tissue specificity, Enzyme Classification (EC) code for enzyme and disease data related to protein dysfunction. For each gene and gene product the Gene Ontology (GO) classification with regard to molecular function, biological processes and cellular component is reported too. Links to external database resources are also provided. Protein sequences from all the different species present in MitoNuc are pair-wise aligned against the human protein sequence using the Needleman-Wunsch global alignment. Proteins whose sequence similarity is higher than the threshold fixed value of 60%, and which fall into the same functional class, are multi-aligned using the CLUSTAL algorithm, manually controlled for consistence and grouped in Clusters. Each Cluster is named with the SWISSPROT Human identifier and groups homologous proteins from all the species present in MitoNuc. These data can be queried from the home page of the database. As far as the query interface is concerned, the database can be queried combining different searching criteria and/or running multi-record queries. The multi-record querying can be run giving a list of reference values such as the entry identifiers (ID) of the MitoNuc database, the entry identifiers (ID) of the external linked public databases, the name of the genes, etc. Sequence data such as protein, gene or transcript sequence and sub-sequences (exons, introns, UTRs regions), can be extracted and locally saved in different file format. The present release of the database contains a total number of 1344 entries. 1. Attimonelli M. et al. (2002) Nucleic Acids Res. 30, 172-3 2. Birney E. et al. (2004) Nucleic Acids Res. 32, 468-70 3. Bairoch, A. et al. (2000) Nucleic Acids Res. 28, 45-48 4. Pruitt K.D. et al. (2001) Nucleic Acids Res. 29,137-40 5. Pesole, G. et al. (2002) Nucleic Acids Res. 30, 335-40 6. Hamosh A. et al. (2002) Nucleic Acids Res. 30, 52-5
MitoNuc: a specialized collection of nuclear metazoan genes encoding mitochondrial protein
F Licciulli;G Grillo;D D'Elia
2004
Abstract
INTRODUCTION: Mitochondria are sub-cellular organelles essential for the maintenance of cellular physiology of most of eukaryotic organisms. Besides their central role in the energetic metabolism of respiring cells they are also involved in some key steps of other important metabolic pathways such as heme and hormone biosynthesis. Furthermore, numerous studies have demonstrated a key role of mitochondria in apoptosis, aging and in a number of different human diseases, including Parkinsons, diabetes mellitus and Alzheimers. Despite their importance in the cell life maintenance, about 95% of proteins contributing to mitochondrial biogenesis and functional activities are nuclear encoded and hence all functions of mitochondria depend on the interaction of nuclear and organelle genomes. In order to provide a specialized resource for functional and comparative genomics studies supporting research on basic science of mitochondria and mitochondrial pathogenesis we developed MitoNuc (1), which is a comprehensive collection of nuclear genes encoding for mitochondrial proteins of metazoan species consolidating information from the most accredited public databases. MATERIALS AND METHODS: MitoNuc is a relational database implemented in the Database Management System (DBMS) MySQL. The database has been designed and developed to provide comprehensive data on nuclear genes and encoded proteins targeted to the mitochondrion. Data are extracted from external database and include: gene sequence, structure and information from ENSEMBL [2], protein sequence and information from SWISSPROT [3], transcript sequence and structure from RefSeq [4] and UTRdb [5], disease information from OMIM [6]. The database is automatically annotated thanks to the development of BioPerl scripts able to extract data from the external databases. Control procedures, including BLAST check against protein sequences reported in the correlated public data resources (SWISSPROT and ENSEMBL) have been implemented to avoid data inconsistencies. The database is available in an EMBL-like flat file format and retrievable through the SRS Retrieval System at the following address http://www.ba.itb.cnr.it/srs/ . Furthermore MitoNuc can be queried in a more user-friendly interface developed in PHP and BioPerl at the following address: http://www.ba.itb.cnr.it/MitoNuc/. RESULTS: Each database entry consists of a nuclear gene coding for a mitochondrial protein in a given species, and reports information on: species name and taxonomic classification; gene name, functional product, sub-mitochondrial localization, protein tissue specificity, Enzyme Classification (EC) code for enzyme and disease data related to protein dysfunction. For each gene and gene product the Gene Ontology (GO) classification with regard to molecular function, biological processes and cellular component is reported too. Links to external database resources are also provided. Protein sequences from all the different species present in MitoNuc are pair-wise aligned against the human protein sequence using the Needleman-Wunsch global alignment. Proteins whose sequence similarity is higher than the threshold fixed value of 60%, and which fall into the same functional class, are multi-aligned using the CLUSTAL algorithm, manually controlled for consistence and grouped in Clusters. Each Cluster is named with the SWISSPROT Human identifier and groups homologous proteins from all the species present in MitoNuc. These data can be queried from the home page of the database. As far as the query interface is concerned, the database can be queried combining different searching criteria and/or running multi-record queries. The multi-record querying can be run giving a list of reference values such as the entry identifiers (ID) of the MitoNuc database, the entry identifiers (ID) of the external linked public databases, the name of the genes, etc. Sequence data such as protein, gene or transcript sequence and sub-sequences (exons, introns, UTRs regions), can be extracted and locally saved in different file format. The present release of the database contains a total number of 1344 entries. 1. Attimonelli M. et al. (2002) Nucleic Acids Res. 30, 172-3 2. Birney E. et al. (2004) Nucleic Acids Res. 32, 468-70 3. Bairoch, A. et al. (2000) Nucleic Acids Res. 28, 45-48 4. Pruitt K.D. et al. (2001) Nucleic Acids Res. 29,137-40 5. Pesole, G. et al. (2002) Nucleic Acids Res. 30, 335-40 6. Hamosh A. et al. (2002) Nucleic Acids Res. 30, 52-5I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.