We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work.

Matchtigs: minimum plain text representation of k-mer sets

Pibiri GE;
2023

Abstract

We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work.
2023
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
24
32
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02968-z
k-mer sets
Plain text compression
Genomic sequences
Graph algorithm
Sequence analysis
Minimum-cost flow
Chinese postman problem
Article number: 136
1
5
info:eu-repo/semantics/article
262
Schmidt, S; Khan, S; Alanko, Jn; Pibiri, Ge; Tomescu, Ai
01 Contributo su Rivista::01.01 Articolo in rivista
open
   Safe and Complete Algorithms for Bioinformatics
   SAFEBIO
   H2020
   851093

   Labs for prototyping future Mobility Data sharing cloud solutions
   MobiDataLab
   H2020
   101006879
File in questo prodotto:
File Dimensione Formato  
prod_484067-doc_199906.pdf

accesso aperto

Descrizione: Matchtigs: minimum plain text representation of k-mer sets
Tipologia: Versione Editoriale (PDF)
Dimensione 2.75 MB
Formato Adobe PDF
2.75 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/464352
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 12
social impact