The application of Next Generation Sequencing to Whole Genome Sequencing is still showing enhanced advantages in many research fields such as medical diagnostics. Here we conducted a comparative assessment of GATK3-HaplotypeCaller, FreeBayes and Samtools-mpileup, three state of the art variant calling algorithms, on a "gold standard" benchmark. The analyses were performed exploiting the high performance computing technology of ReCaS datacenter. Our results indicated that Samtools-mpileup was the most conservative with the highest precision, followed by FreeBayes and GATK3-HaplotypeCaller, which presented the highest sensitivity. Moreover, the merged call-set resulted in lower sensitivity and precision, suggesting additional testing using different merging methods followed by wet-lab validations. Despite some limitations, these results provide important insights for the development and refinement of novel bioinformatics tools and workflows.

Variant Calling Algorithms Benchmark Using High Performance Computing

Bachir Balech;Monica Santamaria;Graziano Pesole
2020

Abstract

The application of Next Generation Sequencing to Whole Genome Sequencing is still showing enhanced advantages in many research fields such as medical diagnostics. Here we conducted a comparative assessment of GATK3-HaplotypeCaller, FreeBayes and Samtools-mpileup, three state of the art variant calling algorithms, on a "gold standard" benchmark. The analyses were performed exploiting the high performance computing technology of ReCaS datacenter. Our results indicated that Samtools-mpileup was the most conservative with the highest precision, followed by FreeBayes and GATK3-HaplotypeCaller, which presented the highest sensitivity. Moreover, the merged call-set resulted in lower sensitivity and precision, suggesting additional testing using different merging methods followed by wet-lab validations. Despite some limitations, these results provide important insights for the development and refinement of novel bioinformatics tools and workflows.
2020
Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari (IBIOM)
9788849239171
variant calling
benchmark
human genome
high performance computing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/379767
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact