The great success of visual features learned from deep neural networks has led to a significant effort to develop efficient and scalable technologies for image retrieval. Nevertheless, its usage in large-scale Web applications of content-based retrieval is still challenged by their high dimensionality. To overcome this issue, some image retrieval systems employ the product quantization method to learn a large-scale visual dictionary from a training set of global neural network features. These approaches are implemented in main memory, preventing their usage in big-data applications. The contribution of the work is mainly devoted to investigating some approaches to transform neural network features into text forms suitable for being indexed by a standard full-text retrieval engine such as Elasticsearch. The basic idea of our approaches relies on a transformation of neural network features with the twofold aim of promoting the sparsity without the need of unsupervised pre-training. We validate our approach on a recent convolutional neural network feature, namely Regional Maximum Activations of Convolutions (R-MAC), which is a state-of-art descriptor for image retrieval. Its effectiveness has been proved through several instance-level retrieval benchmarks. An extensive experimental evaluation conducted on the standard benchmarks shows the effectiveness and efficiency of the proposed approach and how it compares to state-of-the-art main-memory indexes.

Large-scale instance-level image retrieval

Amato G;Carrara F;Falchi F;Gennaro C;Vadicamo L
2019

Abstract

The great success of visual features learned from deep neural networks has led to a significant effort to develop efficient and scalable technologies for image retrieval. Nevertheless, its usage in large-scale Web applications of content-based retrieval is still challenged by their high dimensionality. To overcome this issue, some image retrieval systems employ the product quantization method to learn a large-scale visual dictionary from a training set of global neural network features. These approaches are implemented in main memory, preventing their usage in big-data applications. The contribution of the work is mainly devoted to investigating some approaches to transform neural network features into text forms suitable for being indexed by a standard full-text retrieval engine such as Elasticsearch. The basic idea of our approaches relies on a transformation of neural network features with the twofold aim of promoting the sparsity without the need of unsupervised pre-training. We validate our approach on a recent convolutional neural network feature, namely Regional Maximum Activations of Convolutions (R-MAC), which is a state-of-art descriptor for image retrieval. Its effectiveness has been proved through several instance-level retrieval benchmarks. An extensive experimental evaluation conducted on the standard benchmarks shows the effectiveness and efficiency of the proposed approach and how it compares to state-of-the-art main-memory indexes.
2019
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Deep features
Image retrieval
Inverted index
Surrogate text representation
File in questo prodotto:
File Dimensione Formato  
prod_411385-doc_144858.pdf

accesso aperto

Descrizione: Preprint - Large-scale instance-level image retrieval
Tipologia: Versione Editoriale (PDF)
Dimensione 2.31 MB
Formato Adobe PDF
2.31 MB Adobe PDF Visualizza/Apri
prod_411385-doc_157078.pdf

non disponibili

Descrizione: Large-scale instance-level image retrieval
Tipologia: Versione Editoriale (PDF)
Dimensione 2.41 MB
Formato Adobe PDF
2.41 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/366663
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 36
  • ???jsp.display-item.citation.isi??? 25
social impact