A wide class of approximate pattern matching algorithms are based on a filtration phase in which spaced seeds are used to discard regions where a match is not likely to occur. The problem of determining the "optimal" shape of a spaced seed in a particular setting is known to be a hard one: in practice spaced seeds are chosen using heuristics or considering a restricted family of seeds with "reasonably good" performances. In this paper we consider the family of spaced seeds with a periodic structure. Such seeds have been already proven valuable both as a theoretical tool and in bioinformatics applications. We show that known combinatorial objects, namely Difference Sets and Families and Steiner Systems, naturally lead to the design of lossless periodic seeds for approximate pattern matching with k=2 and k=3 mismatches. We analyze in depth the properties of the resulting seeds obtaining insights also on seeds without a periodic structure. The results of the analysis are then used to guide an experimental evaluation of the effectiveness of periodic seeds for pattern lengths of practical interest. Our results give a complete picture of strengths and limitations of periodic seeds, and can be used by practitioners for the design of effective approximate pattern matching algorithms.

Design and analysis of periodic multiple seeds

Manzini G
2014

Abstract

A wide class of approximate pattern matching algorithms are based on a filtration phase in which spaced seeds are used to discard regions where a match is not likely to occur. The problem of determining the "optimal" shape of a spaced seed in a particular setting is known to be a hard one: in practice spaced seeds are chosen using heuristics or considering a restricted family of seeds with "reasonably good" performances. In this paper we consider the family of spaced seeds with a periodic structure. Such seeds have been already proven valuable both as a theoretical tool and in bioinformatics applications. We show that known combinatorial objects, namely Difference Sets and Families and Steiner Systems, naturally lead to the design of lossless periodic seeds for approximate pattern matching with k=2 and k=3 mismatches. We analyze in depth the properties of the resulting seeds obtaining insights also on seeds without a periodic structure. The results of the analysis are then used to guide an experimental evaluation of the effectiveness of periodic seeds for pattern lengths of practical interest. Our results give a complete picture of strengths and limitations of periodic seeds, and can be used by practitioners for the design of effective approximate pattern matching algorithms.
2014
Istituto di informatica e telematica - IIT
Approximate string matching
Difference Family
Difference Set
Lossless filtration
Spaced seed
Steiner System
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/299661
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact