Given a set of records, a threshold value t and a similarity function, we investigate the problem of finding all pairs of records such that similarity between each pair is above t. We propose several optimizations on the existing approaches to solve the problem. Our algorithm outperforms the state-of-the-art algorithms in the case with large and high-dimensional datasets. The speedup we achieved varied from 30% to 4-x depending on the similarity threshold and the dataset properties.
An incremental prefix filtering approach for the all pairs similarity search problem
Perego R;Silvestri F
2010
Abstract
Given a set of records, a threshold value t and a similarity function, we investigate the problem of finding all pairs of records such that similarity between each pair is above t. We propose several optimizations on the existing approaches to solve the problem. Our algorithm outperforms the state-of-the-art algorithms in the case with large and high-dimensional datasets. The speedup we achieved varied from 30% to 4-x depending on the similarity threshold and the dataset properties.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
prod_92122-doc_131201.pdf
solo utenti autorizzati
Descrizione: An incremental prefix filtering approach for the all pairs similarity search problem
Tipologia:
Versione Editoriale (PDF)
Dimensione
588.06 kB
Formato
Adobe PDF
|
588.06 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.