The application of Next Generation Sequencing to Whole GenomeSequencing is still showing enhanced advantages in many research fields suchas medical diagnostics. Here we conducted a comparative assessment ofGATK3-HaplotypeCaller, FreeBayes and Samtools-mpileup, three state ofthe art variant calling algorithms, on a "gold standard" benchmark. Theanalyses were performed exploiting the high performance computing technologyof ReCaS datacenter. Our results indicated that Samtools-mpileup was themost conservative with the highest precision, followed by FreeBayes andGATK3-HaplotypeCaller, which presented the highest sensitivity. Moreover,the merged call-set resulted in lower sensitivity and precision, suggestingadditional testing using different merging methods followed by wet-labvalidations. Despite some limitations, these results provide important insightsfor the development and refinement of novel bioinformatics tools and workflows.

Variant Calling Algorithms Benchmark Using High Performance Computing

Bachir Balech
;
Monica Santamaria;Graziano Pesole
2020

Abstract

The application of Next Generation Sequencing to Whole GenomeSequencing is still showing enhanced advantages in many research fields suchas medical diagnostics. Here we conducted a comparative assessment ofGATK3-HaplotypeCaller, FreeBayes and Samtools-mpileup, three state ofthe art variant calling algorithms, on a "gold standard" benchmark. Theanalyses were performed exploiting the high performance computing technologyof ReCaS datacenter. Our results indicated that Samtools-mpileup was themost conservative with the highest precision, followed by FreeBayes andGATK3-HaplotypeCaller, which presented the highest sensitivity. Moreover,the merged call-set resulted in lower sensitivity and precision, suggestingadditional testing using different merging methods followed by wet-labvalidations. Despite some limitations, these results provide important insightsfor the development and refinement of novel bioinformatics tools and workflows.
2020
Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari (IBIOM)
Inglese
Giorgio Pietro Maggi
Giorgio Pietro Maggi
Atti dell'incontro con gli utenti DATA CENTER ReCaS-BARI
77
85
9
9788849239171
Gangemi Editore spa
Roma
ITALIA
Sì, ma tipo non specificato
variant calling
benchmark
human genome
high performance computing
Internazionale
Stampa
No
4
02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio)
268
restricted
Balech, Bachir; Chiara, Matteo; Santamaria, Monica; Pesole, Graziano
info:eu-repo/semantics/bookPart
File in questo prodotto:
File Dimensione Formato  
Balech_etal_2020.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 375.49 kB
Formato Adobe PDF
375.49 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/379767
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact