Due to the great advances of Next Generation Sequencing (NGS) techniques, bioinformaticians are faced with large amounts of genomic and clinical data, which are growing exponentially. A striking example is The Cancer Genome Atlas (TCGA), whose aim is to provide a comprehensive archive of biomedical data about tumors. Indeed, TCGA contains more than 15 TB of genomic and clinical data, whose analysis and interpretation are posing great challenges to the bioinformatics community. In this work, we focus on integration and analysis of NGS data extracted from TCGA. In particular, we integrate RNA-seq and DNA-methylation experiments and perform a supervised classification analysis. Thanks to this integration, we are able to distinguish successfully the tumoral samples from the normal ones and to extract reliable rule-based classification models that contain salient features (i.e., genes and methylated sites). These features, which are related to the investigated tumor, can be studied by domain experts in order to obtain new knowledge about cancer. Finally, our proposed integration and analysis method can be adopted with success for further studies on different data sources and NGS experiments.

Genomic Data Integration: A Case Study on Next Generation Sequencing of Cancer

Weitschek E;Cumbo F;Felici G
2016-01-01

Abstract

Due to the great advances of Next Generation Sequencing (NGS) techniques, bioinformaticians are faced with large amounts of genomic and clinical data, which are growing exponentially. A striking example is The Cancer Genome Atlas (TCGA), whose aim is to provide a comprehensive archive of biomedical data about tumors. Indeed, TCGA contains more than 15 TB of genomic and clinical data, whose analysis and interpretation are posing great challenges to the bioinformatics community. In this work, we focus on integration and analysis of NGS data extracted from TCGA. In particular, we integrate RNA-seq and DNA-methylation experiments and perform a supervised classification analysis. Thanks to this integration, we are able to distinguish successfully the tumoral samples from the normal ones and to extract reliable rule-based classification models that contain salient features (i.e., genes and methylated sites). These features, which are related to the investigated tumor, can be studied by domain experts in order to obtain new knowledge about cancer. Finally, our proposed integration and analysis method can be adopted with success for further studies on different data sources and NGS experiments.
2016
Istituto di Analisi dei Sistemi ed Informatica ''Antonio Ruberti'' - IASI
data integration
bio
dna methylation
rna sequencing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/358490
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact