High-performance Next-Generation Sequencing (NGS) has become a widely used technology to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analysis strategies is mapping NGS reads against a reference database and a critical issue emerges in this phase: the problem of multireads. In this paper we present a novel approach to represent and quantify read mapping ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, providing a description of the uncertainty of the results due to multiread presence. In a preliminary experiment on HeLa cells, the method correctly detected the possibility of false positiveness, while on a case-control study of human endobronchial biopsies, the method identified 11 genes with possible different expression, four of them with an uncertain fold change. This last result was confirmed by FDR adjusted Fisher's test, while DESeq2 did not provide significant differences between case and control.

Managing NGS differential expression uncertainty with fuzzy sets

Consiglio A;Grillo G;Liuni S
2016

Abstract

High-performance Next-Generation Sequencing (NGS) has become a widely used technology to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analysis strategies is mapping NGS reads against a reference database and a critical issue emerges in this phase: the problem of multireads. In this paper we present a novel approach to represent and quantify read mapping ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, providing a description of the uncertainty of the results due to multiread presence. In a preliminary experiment on HeLa cells, the method correctly detected the possibility of false positiveness, while on a case-control study of human endobronchial biopsies, the method identified 11 genes with possible different expression, four of them with an uncertain fold change. This last result was confirmed by FDR adjusted Fisher's test, while DESeq2 did not provide significant differences between case and control.
2016
Istituto di Tecnologie Biomediche - ITB
Differential expression
Fuzzy sets
Multireads
Possibility measure
RNA-Seq
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/333568
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact