The application of Next Generation Sequencing to Whole GenomeSequencing is still showing enhanced advantages in many research fields suchas medical diagnostics. Here we conducted a comparative assessment ofGATK3-HaplotypeCaller, FreeBayes and Samtools-mpileup, three state ofthe art variant calling algorithms, on a "gold standard" benchmark. Theanalyses were performed exploiting the high performance computing technologyof ReCaS datacenter. Our results indicated that Samtools-mpileup was themost conservative with the highest precision, followed by FreeBayes andGATK3-HaplotypeCaller, which presented the highest sensitivity. Moreover,the merged call-set resulted in lower sensitivity and precision, suggestingadditional testing using different merging methods followed by wet-labvalidations. Despite some limitations, these results provide important insightsfor the development and refinement of novel bioinformatics tools and workflows.

Variant Calling Algorithms Benchmark Using High Performance Computing

Bachir Balech
;
Monica Santamaria;Graziano Pesole
2020

Abstract

The application of Next Generation Sequencing to Whole GenomeSequencing is still showing enhanced advantages in many research fields suchas medical diagnostics. Here we conducted a comparative assessment ofGATK3-HaplotypeCaller, FreeBayes and Samtools-mpileup, three state ofthe art variant calling algorithms, on a "gold standard" benchmark. Theanalyses were performed exploiting the high performance computing technologyof ReCaS datacenter. Our results indicated that Samtools-mpileup was themost conservative with the highest precision, followed by FreeBayes andGATK3-HaplotypeCaller, which presented the highest sensitivity. Moreover,the merged call-set resulted in lower sensitivity and precision, suggestingadditional testing using different merging methods followed by wet-labvalidations. Despite some limitations, these results provide important insightsfor the development and refinement of novel bioinformatics tools and workflows.
2020
Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari (IBIOM)
9788849239171
variant calling
benchmark
human genome
high performance computing
File in questo prodotto:
File Dimensione Formato  
Balech_etal_2020.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 375.49 kB
Formato Adobe PDF
375.49 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/379767
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact