MOTIVATION: The recent availability of next generation sequencing (NGS) technologies, has provided the scientific community with an unprecedented opportunity for large-scale analysis of genome in a large number of organisms. One of the most challenging task for bioinformaticians is to develop tools that provide biologists with an easy access to curated and non-redundant collections of sequence data. Non-coding RNAs, for a long time believed to be not-functional, are emerging as the most large and important family of gene regulators. METHODS: NonCode aReNA DataBase is a comprehensive and non-redundant source of manually curated and automatically annotated ncRNA transcripts collected from major public resources. The database is built through a set of ETL (Extraction Transformation Loading) automated processes which extracts and collects data from VEGA, ENSEMBL, RefSeq, miRBase, GtRNAdb and piRNABank. The automatic process guarantees also recurring updates. The identification of redundant sequences is made by analyzing both cross-link references and sequence similarity. Furthermore non-coding RNA sequences have been classified in diverse biotypes and associated to Sequence Ontology terms. NonCode aReNA DataBase is originally developed as a component of a bigger project, represented by a datawarehouse and an analysis workflow, for the functional annotation of ncRNAs from NGS data. RESULTS: NonCode aReNA Database is currently available as a web-resource at http://ncrnadb.ba.itb.cnr.it/. The database can be queried by using multi-criteria and ontological search, through an easy-to-use web interface. Query results can be exported as non-redundant collections of ncRNA transcripts. Currently NonCode aReNA DataBase contains 134,908 human ncRNAs classified in 24 biotypes, and next updates will include transcripts of Mus musculus and Arabidopsis thaliana
NonCode aReNA DB: a non-redundant and integrated collection of non-coding RNAs
Giorgio De Caro;Arianna Consiglio;Domenica D'Elia;Andreas Gisel;Giorgio Grillo;Sabino Liuni;Angelica Tulipano;Flavio Licciulli
2015
Abstract
MOTIVATION: The recent availability of next generation sequencing (NGS) technologies, has provided the scientific community with an unprecedented opportunity for large-scale analysis of genome in a large number of organisms. One of the most challenging task for bioinformaticians is to develop tools that provide biologists with an easy access to curated and non-redundant collections of sequence data. Non-coding RNAs, for a long time believed to be not-functional, are emerging as the most large and important family of gene regulators. METHODS: NonCode aReNA DataBase is a comprehensive and non-redundant source of manually curated and automatically annotated ncRNA transcripts collected from major public resources. The database is built through a set of ETL (Extraction Transformation Loading) automated processes which extracts and collects data from VEGA, ENSEMBL, RefSeq, miRBase, GtRNAdb and piRNABank. The automatic process guarantees also recurring updates. The identification of redundant sequences is made by analyzing both cross-link references and sequence similarity. Furthermore non-coding RNA sequences have been classified in diverse biotypes and associated to Sequence Ontology terms. NonCode aReNA DataBase is originally developed as a component of a bigger project, represented by a datawarehouse and an analysis workflow, for the functional annotation of ncRNAs from NGS data. RESULTS: NonCode aReNA Database is currently available as a web-resource at http://ncrnadb.ba.itb.cnr.it/. The database can be queried by using multi-criteria and ontological search, through an easy-to-use web interface. Query results can be exported as non-redundant collections of ncRNA transcripts. Currently NonCode aReNA DataBase contains 134,908 human ncRNAs classified in 24 biotypes, and next updates will include transcripts of Mus musculus and Arabidopsis thalianaI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.