CNR Institutional Research Information System

Text collections of data need not only search support for identical objects, but the approximate matching is even more important. A suitable metric to such a task is the edit distance measure. However, the quadratic computational complexity of edit distance prevents from applying naive storage organizations, such as the sequential search, and more sophisticated search structures must be applied. We have investigated the properties of the D-index to approximate searching and matching in text databases. The experiments confirm a very good performance for retrieving close objects and sub-linear scalability to process large files. Even the similarity joins can be performed efficiently.

A Metric Index for Approximate Text Management

Dohnal V;Gennaro C;Zezula P

2002

Abstract

Text collections of data need not only search support for identical objects, but the approximate matching is even more important. A suitable metric to such a task is the edit distance measure. However, the quadratic computational complexity of edit distance prevents from applying naive storage organizations, such as the sequential search, and more sophisticated search structures must be applied. We have investigated the properties of the D-index to approximate searching and matching in text databases. The experiments confirm a very good performance for retrieving close objects and sub-linear scalability to process large files. Even the similarity joins can be performed efficiently.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2002
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				0-88986-362-8
			
	Parole chiave
	
				Text management
Information retrieval
Metric space
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_91518-doc_56964.pdf solo utenti autorizzati Descrizione: A Metric Index for Approximate Text Management Tipologia: Versione Editoriale (PDF) Dimensione 144.45 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	144.45 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/114004

Citazioni

ND

ND

ND

social impact