Background. Next Generation Sequencing (NGS) data has been extensively exploited in the last decade to analyse genome variations and to understand the role of genome variations in complex diseases. Copy number variations (CNVs) are genomic structural variants estimated to account for about 1.2% of the total variation in humans. CNVs in coding or regulatory regions may have an impact on the gene expression, often also at a functional level, and contribute to cause different diseases like cancer, autism and cardiovascular diseases. Computational methods developed for detection of CNVs from NGS data and based on the depth of coverage are limited to the identification of medium/large events and heavily influenced by the level of coverage. Result. In this paper we propose, CNVScan a CNV detection method based on scan statistics that overcomes limitations of previous read count (RC) based approaches mainly by being a window-less approach. The scans statistics have been used before mainly in epidemiology and ecology studies, but never before was applied to the CNV detection problem to the best of our knowledge. Since we avoid window- ing we do not have to choose an optimal window-size which is a key step in many previous approaches. Extensive simulated experiments with single read data in extreme situations (low coverage, short reads, homo/heterozygoticity) show that this approach is very effective for a range of small CNV (200-500 bp) for which previous state-of-the-art methods are not suitable. Conclusion. The scan statistics technique is applied and adapted in this paper for the first time to the CNV detection problem. Comparison with state-of-the art methods shows the approach is quite effective in discovering shortCNVin rather extreme situations in which previous methods fail or have degraded performance. CNVScan thus extends the range of CNV sizes and types that can be detected via read count with single read data.
CNVScan: detecting border- line copy number variations in NGS data via scan statistics
D'Aurizio R;Pellegrini M;Leoncini M
2015
Abstract
Background. Next Generation Sequencing (NGS) data has been extensively exploited in the last decade to analyse genome variations and to understand the role of genome variations in complex diseases. Copy number variations (CNVs) are genomic structural variants estimated to account for about 1.2% of the total variation in humans. CNVs in coding or regulatory regions may have an impact on the gene expression, often also at a functional level, and contribute to cause different diseases like cancer, autism and cardiovascular diseases. Computational methods developed for detection of CNVs from NGS data and based on the depth of coverage are limited to the identification of medium/large events and heavily influenced by the level of coverage. Result. In this paper we propose, CNVScan a CNV detection method based on scan statistics that overcomes limitations of previous read count (RC) based approaches mainly by being a window-less approach. The scans statistics have been used before mainly in epidemiology and ecology studies, but never before was applied to the CNV detection problem to the best of our knowledge. Since we avoid window- ing we do not have to choose an optimal window-size which is a key step in many previous approaches. Extensive simulated experiments with single read data in extreme situations (low coverage, short reads, homo/heterozygoticity) show that this approach is very effective for a range of small CNV (200-500 bp) for which previous state-of-the-art methods are not suitable. Conclusion. The scan statistics technique is applied and adapted in this paper for the first time to the CNV detection problem. Comparison with state-of-the art methods shows the approach is quite effective in discovering shortCNVin rather extreme situations in which previous methods fail or have degraded performance. CNVScan thus extends the range of CNV sizes and types that can be detected via read count with single read data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.