Inverted indexes are usually represented by dividing posting lists into constant-sized blocks and representing them with an encoder for sequences of integers. Different encoders yield a different point in the space-time trade-off curve, with the fastest being several times larger than the most space-efficient. An important design decision for an index is thus the choice of the fastest encoding method such that the index fits in the available memory. However, a better usage of the space budget could be obtained by using faster encoders for frequently accessed blocks, and more space-efficient ones those that are rarely accessed. To perform this choice optimally, we introduce a linear time algorithm that, given a query distribution and a set of encoders, selects the best encoder for each index block to obtain the lowest expected query processing time respecting a given space constraint. To demonstrate the effectiveness of this approach we perform an extensive experimental analysis, which shows that our algorithm produces indexes which are significantly faster than single-encoder indexes under several query processing strategies, while respecting the same space constraints.

Optimal space-time tradeoffs for inverted indexes

Tonellotto N;Venturini R
2015

Abstract

Inverted indexes are usually represented by dividing posting lists into constant-sized blocks and representing them with an encoder for sequences of integers. Different encoders yield a different point in the space-time trade-off curve, with the fastest being several times larger than the most space-efficient. An important design decision for an index is thus the choice of the fastest encoding method such that the index fits in the available memory. However, a better usage of the space budget could be obtained by using faster encoders for frequently accessed blocks, and more space-efficient ones those that are rarely accessed. To perform this choice optimally, we introduce a linear time algorithm that, given a query distribution and a set of encoders, selects the best encoder for each index block to obtain the lowest expected query processing time respecting a given space constraint. To demonstrate the effectiveness of this approach we perform an extensive experimental analysis, which shows that our algorithm produces indexes which are significantly faster than single-encoder indexes under several query processing strategies, while respecting the same space constraints.
2015
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
8th International Conference on Web search and data mining (WSDM 2015)
47
56
978-1-4503-3317-7
http://dl.acm.org/citation.cfm?id=2685297&CFID=748200543&CFTOKEN=15193674
Sì, ma tipo non specificato
31/01/2015-06/02/2015
Shanghai, China
Compression
Inverted indexes
Knapsack problems
Il codice modulo commessa corretto è Tecnologie avanzate, Sistemi e Servizi per Grid
2
restricted
Ottaviano G.; Tonellotto N.; Venturini R.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_336713-doc_108718.pdf

solo utenti autorizzati

Descrizione: Optimal space-time tradeoffs for inverted indexes
Tipologia: Versione Editoriale (PDF)
Dimensione 827.76 kB
Formato Adobe PDF
827.76 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_336713-doc_108719.pdf

solo utenti autorizzati

Descrizione: Optimal space-time tradeoffs for inverted indexes
Tipologia: Versione Editoriale (PDF)
Dimensione 1.35 MB
Formato Adobe PDF
1.35 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/302877
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 30
  • ???jsp.display-item.citation.isi??? 24
social impact