CNR Institutional Research Information System

Given a set (Formula presented.) of q documents, the Longest Common Substring (LCS) problem asks, for any integer 2 ? k ? q, the longest substring that appears in k documents. LCS is a well-studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences alignments and analysis. This problem has been solved by Hui (2000 International Journal of Computer Science and Engineering 15 73-76) by using a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with the use of suffix trees. In this article, we present a simple method for solving the LCS problem by using suffix trees (STs) and classical union-find data structures. In turn, we show how this simple algorithm can be adapted in order to work with other space efficient data structures such as the enhanced suffix arrays (ESA) and the compressed suffix tree.

The longest common substring problem

CROCHEMORE MAXIME;ILIOPOULOS COSTAS S;LANGIU ALESSIO;MIGNOSI FILIPPO

2017

Abstract

Given a set (Formula presented.) of q documents, the Longest Common Substring (LCS) problem asks, for any integer 2 ? k ? q, the longest substring that appears in k documents. LCS is a well-studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences alignments and analysis. This problem has been solved by Hui (2000 International Journal of Computer Science and Engineering 15 73-76) by using a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with the use of suffix trees. In this article, we present a simple method for solving the LCS problem by using suffix trees (STs) and classical union-find data structures. In turn, we show how this simple algorithm can be adapted in order to work with other space efficient data structures such as the enhanced suffix arrays (ESA) and the compressed suffix tree.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				algorithms on text
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/300625

Citazioni

ND

5

ND

social impact