CNR Institutional Research Information System

Virtually every modern search application, either desktop, web, or mobile, features some kind of query auto-completion. In its basic form, the problem consists in retrieving from a string set a small number of completions, i.e. strings be- ginning with a given prefix, that have the highest scores according to some static ranking. In this paper, we focus on the case where the string set is so large that compres- sion is needed to fit the data structure in memory. This is a compelling case for web search engines and social networks, where it is necessary to index hundreds of millions of distinct queries to guarantee a reasonable coverage; and for mobile devices, where the amount of memory is limited. We present three different trie-based data structures to address this problem, each one with different space/time/ complexity trade-offs. Experiments on large-scale datasets show that it is possible to compress the string sets, including the scores, down to spaces competitive with the gzip'ed data, while supporting efficient retrieval of completions at about a microsecond per completion.

Space-efficient data structures for top-k completion

Hsu P.;Ottaviano G.

2013

Abstract

Virtually every modern search application, either desktop, web, or mobile, features some kind of query auto-completion. In its basic form, the problem consists in retrieving from a string set a small number of completions, i.e. strings be- ginning with a given prefix, that have the highest scores according to some static ranking. In this paper, we focus on the case where the string set is so large that compres- sion is needed to fit the data structure in memory. This is a compelling case for web search engines and social networks, where it is necessary to index hundreds of millions of distinct queries to guarantee a reasonable coverage; and for mobile devices, where the amount of memory is limited. We present three different trie-based data structures to address this problem, each one with different space/time/ complexity trade-offs. Experiments on large-scale datasets show that it is possible to compress the string sets, including the scores, down to spaces competitive with the gzip'ed data, while supporting efficient retrieval of completions at about a microsecond per completion.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2013
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				978-1-4503-2035-1
			
	Parole chiave
	
				Top-k completion
Scored string sets
Tries
Compression
H.3.3 Information Search and Retrieval
E.1 Data Structures. Trees
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_277761-doc_78332.pdf accesso aperto Descrizione: Space-Efficient Data Structures for Top-k Completion Tipologia: Versione Editoriale (PDF) Dimensione 1.15 MB Formato Adobe PDF Visualizza/Apri	1.15 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/244964

Citazioni

ND

55

ND

social impact