CNR Institutional Research Information System

Feature-rich data, such as audio-video recordings, digital images, and results of scientific experiments, nowadays constitute the largest fraction of the massive data sets produced daily in the e-society. Content-based similarity search systems working on such data collections are rapidly growing in importance. Unfortunately, similarity search is in general very expensive and hardly scalable. In this paper we study the case of content-based image retrieval (CBIR) systems, and focus on the problem of increasing the throughput of a large-scale CBIR system that indexes a very large collection of digital images. By analyzing the query log of a real CBIR system available on the Web, we characterize the behavior of users who experience a novel search paradigm, where content-based similarity queries and text-based ones can easily be interleaved. We show that locality and self-similarity is present even in the stream of queries submitted to such a CBIR system. According to these results, we propose an effective way to exploit this locality, by means of a similarity caching system, which stores the results of recently/frequently submitted queries and associated results. Unlike traditional caching, the proposed cache can manage not only exact hits, but also approximate ones that are solved by similarity with respect to the result sets of past queries present in the cache. We evaluate extensively the proposed solution by using the real query stream recorded in the log and a collection of 100 millions of digital photographs. The high hit ratios and small average approximation error figures obtained demonstrate the effectiveness of the approach.

Similarity caching in large-scale image retrieval

Falchi F;Lucchese C;Orlando S;Perego R;Rabitti F

2012

Abstract

Feature-rich data, such as audio-video recordings, digital images, and results of scientific experiments, nowadays constitute the largest fraction of the massive data sets produced daily in the e-society. Content-based similarity search systems working on such data collections are rapidly growing in importance. Unfortunately, similarity search is in general very expensive and hardly scalable. In this paper we study the case of content-based image retrieval (CBIR) systems, and focus on the problem of increasing the throughput of a large-scale CBIR system that indexes a very large collection of digital images. By analyzing the query log of a real CBIR system available on the Web, we characterize the behavior of users who experience a novel search paradigm, where content-based similarity queries and text-based ones can easily be interleaved. We show that locality and self-similarity is present even in the stream of queries submitted to such a CBIR system. According to these results, we propose an effective way to exploit this locality, by means of a similarity caching system, which stores the results of recently/frequently submitted queries and associated results. Unlike traditional caching, the proposed cache can manage not only exact hits, but also approximate ones that are solved by similarity with respect to the result sets of past queries present in the cache. We evaluate extensively the proposed solution by using the real query stream recorded in the log and a collection of 100 millions of digital photographs. The high hit ratios and small average approximation error figures obtained demonstrate the effectiveness of the approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2012
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Lingua/e
	
				Inglese
			
	Rivista
	
				INFORMATION PROCESSING & MANAGEMENT
			
	Codice Web of Science
	
				WOS:000307682100001
			
	Volume
	
				48
			
	Fascicolo
	
				5
			
	Da pagina
	
				803
			
	A pagina
	
				818
			
	Numero di pagine
	
				16
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.ipm.2010.12.006
			
	Codice Scopus
	
				2-s2.0-84864286791
			
	URL
	
				http://www.sciencedirect.com/science/article/pii/S030645731000107X
			
	Referee
	
				Sì, ma tipo non specificato
			
	Parole chiave
	
				Caching
Multimedia search
Content-based search
Large scale
H.5.1 Multimedia Information Systems
			
	Altre informazioni
	
				Tipo Progetto EU_FP7
Software Services and Systems Network (S-Cube) 
Acronimo S-CUBE 
Grant agreement 215483
			
	Numero autori
	
				5
			
	Tipologia
	
				info:eu-repo/semantics/article
			
	Tipologia Login Miur
	
				262
			
	Tutti gli autori
	
						Falchi, F; Lucchese, C; Orlando, S; Perego, R; Rabitti, F
					
	Tipologia
	
				01 Contributo su Rivista::01.01 Articolo in rivista
			
	Fulltext
	
				restricted
			
	Identificativo progetto
	
	Titolo Progetto
	
									Software Services and Systems Network (S-Cube)
								
	Acronimo
	
									S-CUBE
								
	Finanziamento
	
									FP7
								
	N. Contratto
	
									215483
								
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_199497-doc_43714.pdf solo utenti autorizzati Descrizione: Similarity caching in large-scale image retrieval Tipologia: Versione Editoriale (PDF) Dimensione 4.83 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	4.83 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
prod_199497-doc_90292.pdf solo utenti autorizzati Descrizione: Similarity caching in large-scale image retrieval Tipologia: Versione Editoriale (PDF) Dimensione 837.29 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	837.29 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/21682

Citazioni

ND

24

19

social impact