This work is the first of a series of technical report documenting the performed activities to build a big bioinformatics database. Current available bioinformatics databases provide huge amounts of different biological entities such as genes, proteins, diseases, microRNA, annotations, literature references. But in many case studies, a bioinformatician often needs more than one type of resource in order to full analyze his data. The bioinformatics database object of this work will allow the integration of different types of data sources, so that it is possible to perform bioinformatics analysis using only one comprehensive system. The integrated database will be structured as a NoSQL graph database, based on the OrientDB platform, exploiting this way the advantages of that technology in terms of scalability and efficiency with regards to traditional SQL database. The technical report is organized as follow: Section 2 presents a brief overview on the noSQL engine OrientDB; Section 3 presents the general structure of the developed ETLs; Sections from 4 to 7 report the specific ETLs implementations for the bioinformatics databases actually imported.

ETLs for importing NCBI Entrez Gene, miRBase, mirCancer and microRNA into a bioinformatics graph database

A Messina
2015

Abstract

This work is the first of a series of technical report documenting the performed activities to build a big bioinformatics database. Current available bioinformatics databases provide huge amounts of different biological entities such as genes, proteins, diseases, microRNA, annotations, literature references. But in many case studies, a bioinformatician often needs more than one type of resource in order to full analyze his data. The bioinformatics database object of this work will allow the integration of different types of data sources, so that it is possible to perform bioinformatics analysis using only one comprehensive system. The integrated database will be structured as a NoSQL graph database, based on the OrientDB platform, exploiting this way the advantages of that technology in terms of scalability and efficiency with regards to traditional SQL database. The technical report is organized as follow: Section 2 presents a brief overview on the noSQL engine OrientDB; Section 3 presents the general structure of the developed ETLs; Sections from 4 to 7 report the specific ETLs implementations for the bioinformatics databases actually imported.
2015
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Integrated database; Graph database; GraphDB; noSQL; Bioinformatics database
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/323045
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact