Text collections of data need not only search support for identical objects, but the approximate matching is even more important. A suitable metric to such a task is the edit distance measure. However, the quadratic computational complexity of edit distance prevents from applying naive storage organizations, such as the sequential search, and more sophisticated search structures must be applied. We have investigated the properties of the D-index to approximate searching and matching in text databases. The experiments confirm a very good performance for retrieving close objects and sub-linear scalability to process large files. Even the similarity joins can be performed efficiently.
A Metric Index for Approximate Text Management
Gennaro C;
2002
Abstract
Text collections of data need not only search support for identical objects, but the approximate matching is even more important. A suitable metric to such a task is the edit distance measure. However, the quadratic computational complexity of edit distance prevents from applying naive storage organizations, such as the sequential search, and more sophisticated search structures must be applied. We have investigated the properties of the D-index to approximate searching and matching in text databases. The experiments confirm a very good performance for retrieving close objects and sub-linear scalability to process large files. Even the similarity joins can be performed efficiently.File | Dimensione | Formato | |
---|---|---|---|
prod_91518-doc_56964.pdf
solo utenti autorizzati
Descrizione: A Metric Index for Approximate Text Management
Tipologia:
Versione Editoriale (PDF)
Dimensione
144.45 kB
Formato
Adobe PDF
|
144.45 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.