We propose a new efficient and accurate technique for generic approximate similarity searching, based on the use of inverted files. We represent each object of a dataset by the ordering of a number of reference objects according to their distance from the object itself. In order to compare two objects in the dataset, we compare the two corresponding orderings of the reference objects. We show that this representation enables us to use inverted files to obtain very efficiently a very small set of good candidates for the query result. The candidate set is then reordered using the original similarity function to obtain the approximate similarity search result. The proposed technique performs several orders of magnitude better than exact similarity searches, still guaranteeing high accuracy. To also demonstrate the scalability of the proposed approach, tests were executed with various dataset sizes, ranging from 200,000 to 100 million objects.

MI-File: using inverted files for scalable approximate similarity search

Amato G;Gennaro C;Savino P
2014

Abstract

We propose a new efficient and accurate technique for generic approximate similarity searching, based on the use of inverted files. We represent each object of a dataset by the ordering of a number of reference objects according to their distance from the object itself. In order to compare two objects in the dataset, we compare the two corresponding orderings of the reference objects. We show that this representation enables us to use inverted files to obtain very efficiently a very small set of good candidates for the query result. The candidate set is then reordered using the original similarity function to obtain the approximate similarity search result. The proposed technique performs several orders of magnitude better than exact similarity searches, still guaranteeing high accuracy. To also demonstrate the scalability of the proposed approach, tests were executed with various dataset sizes, ranging from 200,000 to 100 million objects.
2014
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Access methods
Multimedia information retrieval
Similarity searching
File in questo prodotto:
File Dimensione Formato  
prod_315309-doc_91602.pdf

solo utenti autorizzati

Descrizione: MI-File: using inverted files for scalable approximate similarity search
Tipologia: Versione Editoriale (PDF)
Dimensione 2.2 MB
Formato Adobe PDF
2.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/283226
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 49
  • ???jsp.display-item.citation.isi??? 39
social impact