High-performance Next-Generation Sequencing (NGS) has become a widely used technology to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analysis strategies is mapping NGS reads against a reference database and a critical issue emerges in this phase: the problem of multireads. In this paper we present a novel approach to represent and quantify read mapping ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, providing a description of the uncertainty of the results due to multiread presence. In a preliminary experiment on HeLa cells, the method correctly detected the possibility of false positiveness, while on a case-control study of human endobronchial biopsies, the method identified 11 genes with possible different expression, four of them with an uncertain fold change. This last result was confirmed by FDR adjusted Fisher's test, while DESeq2 did not provide significant differences between case and control.
Managing NGS differential expression uncertainty with fuzzy sets
Consiglio A;Grillo G;Liuni S
2016
Abstract
High-performance Next-Generation Sequencing (NGS) has become a widely used technology to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analysis strategies is mapping NGS reads against a reference database and a critical issue emerges in this phase: the problem of multireads. In this paper we present a novel approach to represent and quantify read mapping ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, providing a description of the uncertainty of the results due to multiread presence. In a preliminary experiment on HeLa cells, the method correctly detected the possibility of false positiveness, while on a case-control study of human endobronchial biopsies, the method identified 11 genes with possible different expression, four of them with an uncertain fold change. This last result was confirmed by FDR adjusted Fisher's test, while DESeq2 did not provide significant differences between case and control.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.