Introduction Non-coding RNAs (ncRNAs) serve as regulatory molecules for a variety of biological processes. They are roughly classified into two major categories, small non-coding RNAs (sncRNAs), such as microRNAs (miRNAs), and long non-coding RNAs (lncRNAs) according to their size. The lncRNAs have a broader spectrum of functions and are, therefore, a potential new class of cancer therapeutic target [1,2]. In addition there are other different types of ncRNAs whose role is not yet clear: circular-RNA, lincRNA, scRNA, sense-intronic and vault-RNA. New advances in translational research will require an accurate understanding of the functional relationships between protein- coding and ncRNA categories, as well as sponge regulatory networks [3,4]. To achieve this goal, we have built an integrated bioinformatics knowledge base, collecting non-redundant annotations of human ncRNAs, sequences and interactors, which provides a comprehensive access to all the knowledge available concerning ncRNAs, their interaction with other molecules and associated diseases. As key characteristics, the database overcomes the problem of different nomenclatures used by different sources and provides new clues about ncRNA functions throughout interactions inferred by network reconstruction [5]. Methods ncRNA interactions include physical (i.e. molecular bindings between ncRNAs and DNA, RNAs or proteins) and functional relationships (i.e., co-expression, regulation, associated diseases, statistical and functional associations). Interactions stored in the database are in the form 'ncRNAs-mate', where the mate entity belongs to one of the following types: ncRNA, protein coding RNA (pcRNA), gene, protein, pseudogene and phenotype. In order to ensure the data quality of our interaction database we have developed a series of Extraction Transformation and Loading (ETL) modules able to extract, collect and integrate primary annotations, sequences and interactions from different public biological resources. The biological extracted entities and their relations are modelled as a network, a mathematical object composed by nodes (entities) and edges (relations) [5]. Entities redundancy has been identified by cross-link references and sequence similarity using the Cleanup software [6]. Non- coding RNAs are classified in biotypes, associated to Sequence Ontology terms [7] and integrated with data of protein coding RNAs (pcRNAs), gene, protein, pseudogene and phenotype. Furthermore, we extended the cross-reference network with data provided by Ensembl [8], using the biomaRt library of BioConductor [9]. Results Total amount of different entities collected in our interaction database are: 168.058 ncRNA , 5.009 pcRNA, 52.811 genes, 1.999 proteins, 15.940 pseudogenes and 849 phenotype. Moreover, total amount of interactions, based on mate type cardinalities, include: 130.383 ncRNA- ncRNA, 55.048 ncRNA-pcRNA, 1.458.925 ncRNA-gene, 99.653 ncRNA-protein, 70.482 ncRNA-phenotype, 17.217 ncRNA-pseudogene. Conclusions An increasing huge amount of information is spread along existing scope-specific resources, and up to date, the integration of knowledges for relatively new discovered type of biological molecules suffers the lack of nomenclature standards and unified classifications. To show the potentialities offered by our interaction database, we related a subnet known tumour gene circuit of E2F6, EZH1, EZH2, and ARAF [10], by means of the ncRNA-gene interaction database. Among the retrieved interactions, we analyzed those involving one long non-coding RNA (HOTAIR) and one miRNA (miR-148b-3p). HOTAIR up-regulation may be a critical element in metastatic progression [11], whereas the over-expression of miR-148b-3p could inhibit cell proliferation in vitro and suppress tumorigenicity in vivo [12]. A possible mechanism of tumorigenesis, in colorectal cancer and other cancers, could operate in a circuit that involves the up-regulation of proteins aforementioned, and the down-regulation of miR-148b-3p, mediated by HOTAIR. Indeed, HOTAIR may function as competing endogenous RNAs (ceRNAs) to sponge miR-148b-3p, thus modulating the de-repression of its targets, such as ARAF, a proto-oncogene that may be involved in cell proliferation. This example demonstrate the utility of our interaction database for the discovery of ncRNAs regulatory networks. References 1.Qureshi et.al. (2010) "Long non-coding RNAs in nervous system function and diseases, Barin Res. 1338, 20-35. 2.Prenser, J.R et.al (2011)," The emergence of lncRNAs in cancer biology", Cancer Discov. 1, 391- 407. 3.Ebert, M.S. et al. (2010) "Emerging roles for natural microRNA sponges", Cur. Biol. 20, R858-R861. 4.Ebert, M.S, et.al. "MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells", Nat. Methods, 4 , 721-726. 5.Bonnici V, Russo F, Bombieri N, Pulvirenti A and Giugno R (2014) Comprehensive reconstruction and visualization of non-coding regulatory networks in human. Front. Bioeng. Biotechnol. 2:69. doi: 10.3389/fbioe.2014.00069 6.G.Grillo, M.Attimonelli, S.Liuni and G.Pesole, CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases, Comput Appl Biosci. 1996;12:1-8 7.The Sequence Ontology: A tool for the unification of genome annotations. Eilbeck K., Lewis S.E., Mungall C.J. et al., Genome Biology (2005) 6:R44 8.Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., ... & Durbin, R. (2002). The Ensembl genome database project. Nucleic acids research, 30(1), 38-41. 9.Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., & Huber, W. (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21(16), 3439-3440. 10.C. Attwooll et al. A novel repressive E2F6 complex containing the polycomb group protein, EPC1, that interacts with EZH2 in a proliferation-specific manner. In: Journal of Biological Chemistry 280.2 (2005), pp. 1199-1208. 11.R. A. Gupta et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. In: Nature 464.7291 (2010), pp. 1071-1076. 12.Y. Song et al. MicroRNA-148b suppresses cell growth by targeting cholecystokinin-2 receptor in colorectal cancer. In: International Journal of Cancer 131.5 (2012), pp. 1042-1051.
Integrating bioinformatics resources for modelling Human non-coding RNA networks
Giorgio De Caro;Sabino Liuni;Domenica D'Elia;Flavio Licciulli
2016
Abstract
Introduction Non-coding RNAs (ncRNAs) serve as regulatory molecules for a variety of biological processes. They are roughly classified into two major categories, small non-coding RNAs (sncRNAs), such as microRNAs (miRNAs), and long non-coding RNAs (lncRNAs) according to their size. The lncRNAs have a broader spectrum of functions and are, therefore, a potential new class of cancer therapeutic target [1,2]. In addition there are other different types of ncRNAs whose role is not yet clear: circular-RNA, lincRNA, scRNA, sense-intronic and vault-RNA. New advances in translational research will require an accurate understanding of the functional relationships between protein- coding and ncRNA categories, as well as sponge regulatory networks [3,4]. To achieve this goal, we have built an integrated bioinformatics knowledge base, collecting non-redundant annotations of human ncRNAs, sequences and interactors, which provides a comprehensive access to all the knowledge available concerning ncRNAs, their interaction with other molecules and associated diseases. As key characteristics, the database overcomes the problem of different nomenclatures used by different sources and provides new clues about ncRNA functions throughout interactions inferred by network reconstruction [5]. Methods ncRNA interactions include physical (i.e. molecular bindings between ncRNAs and DNA, RNAs or proteins) and functional relationships (i.e., co-expression, regulation, associated diseases, statistical and functional associations). Interactions stored in the database are in the form 'ncRNAs-mate', where the mate entity belongs to one of the following types: ncRNA, protein coding RNA (pcRNA), gene, protein, pseudogene and phenotype. In order to ensure the data quality of our interaction database we have developed a series of Extraction Transformation and Loading (ETL) modules able to extract, collect and integrate primary annotations, sequences and interactions from different public biological resources. The biological extracted entities and their relations are modelled as a network, a mathematical object composed by nodes (entities) and edges (relations) [5]. Entities redundancy has been identified by cross-link references and sequence similarity using the Cleanup software [6]. Non- coding RNAs are classified in biotypes, associated to Sequence Ontology terms [7] and integrated with data of protein coding RNAs (pcRNAs), gene, protein, pseudogene and phenotype. Furthermore, we extended the cross-reference network with data provided by Ensembl [8], using the biomaRt library of BioConductor [9]. Results Total amount of different entities collected in our interaction database are: 168.058 ncRNA , 5.009 pcRNA, 52.811 genes, 1.999 proteins, 15.940 pseudogenes and 849 phenotype. Moreover, total amount of interactions, based on mate type cardinalities, include: 130.383 ncRNA- ncRNA, 55.048 ncRNA-pcRNA, 1.458.925 ncRNA-gene, 99.653 ncRNA-protein, 70.482 ncRNA-phenotype, 17.217 ncRNA-pseudogene. Conclusions An increasing huge amount of information is spread along existing scope-specific resources, and up to date, the integration of knowledges for relatively new discovered type of biological molecules suffers the lack of nomenclature standards and unified classifications. To show the potentialities offered by our interaction database, we related a subnet known tumour gene circuit of E2F6, EZH1, EZH2, and ARAF [10], by means of the ncRNA-gene interaction database. Among the retrieved interactions, we analyzed those involving one long non-coding RNA (HOTAIR) and one miRNA (miR-148b-3p). HOTAIR up-regulation may be a critical element in metastatic progression [11], whereas the over-expression of miR-148b-3p could inhibit cell proliferation in vitro and suppress tumorigenicity in vivo [12]. A possible mechanism of tumorigenesis, in colorectal cancer and other cancers, could operate in a circuit that involves the up-regulation of proteins aforementioned, and the down-regulation of miR-148b-3p, mediated by HOTAIR. Indeed, HOTAIR may function as competing endogenous RNAs (ceRNAs) to sponge miR-148b-3p, thus modulating the de-repression of its targets, such as ARAF, a proto-oncogene that may be involved in cell proliferation. This example demonstrate the utility of our interaction database for the discovery of ncRNAs regulatory networks. References 1.Qureshi et.al. (2010) "Long non-coding RNAs in nervous system function and diseases, Barin Res. 1338, 20-35. 2.Prenser, J.R et.al (2011)," The emergence of lncRNAs in cancer biology", Cancer Discov. 1, 391- 407. 3.Ebert, M.S. et al. (2010) "Emerging roles for natural microRNA sponges", Cur. Biol. 20, R858-R861. 4.Ebert, M.S, et.al. "MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells", Nat. Methods, 4 , 721-726. 5.Bonnici V, Russo F, Bombieri N, Pulvirenti A and Giugno R (2014) Comprehensive reconstruction and visualization of non-coding regulatory networks in human. Front. Bioeng. Biotechnol. 2:69. doi: 10.3389/fbioe.2014.00069 6.G.Grillo, M.Attimonelli, S.Liuni and G.Pesole, CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases, Comput Appl Biosci. 1996;12:1-8 7.The Sequence Ontology: A tool for the unification of genome annotations. Eilbeck K., Lewis S.E., Mungall C.J. et al., Genome Biology (2005) 6:R44 8.Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., ... & Durbin, R. (2002). The Ensembl genome database project. Nucleic acids research, 30(1), 38-41. 9.Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., & Huber, W. (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21(16), 3439-3440. 10.C. Attwooll et al. A novel repressive E2F6 complex containing the polycomb group protein, EPC1, that interacts with EZH2 in a proliferation-specific manner. In: Journal of Biological Chemistry 280.2 (2005), pp. 1199-1208. 11.R. A. Gupta et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. In: Nature 464.7291 (2010), pp. 1071-1076. 12.Y. Song et al. MicroRNA-148b suppresses cell growth by targeting cholecystokinin-2 receptor in colorectal cancer. In: International Journal of Cancer 131.5 (2012), pp. 1042-1051.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.