The Elias-Fano representation of monotone sequences has been recently applied to the compression of inverted indexes, showing excellent query performance thanks to its efficient random access and search operations. While its space occupancy is competitive with some state-of-the-art methods such as -??-Golomb codes and PForDelta, it fails to exploit the local clustering that inverted lists usually exhibit, namely the presence of long subsequences of close identifiers. In this paper we describe a new representation based on partitioning the list into chunks and encoding both the chunks and their endpoints with Elias-Fano, hence forming a two-level data structure. This partitioning enables the encoding to better adapt to the local statistics of the chunk, thus exploiting clustering and improving compression. We present two partition strategies, respectively with fixed and variable-length chunks. For the latter case we introduce a linear-time optimization algorithm which identifies the minimum-space partition up to an arbitrarily small approximation factor. We show that our partitioned Elias-Fano indexes offer significantly better compression than plain Elias-Fano, while preserving their query time eficiency. Furthermore, compared with other state-of-the-art compressed encodings, our indexes exhibit the best compression ratio/query time trade-off. Copyright 2014 ACM.

Partitioned Elias-Fano indexes

Venturini R
2014

Abstract

The Elias-Fano representation of monotone sequences has been recently applied to the compression of inverted indexes, showing excellent query performance thanks to its efficient random access and search operations. While its space occupancy is competitive with some state-of-the-art methods such as -??-Golomb codes and PForDelta, it fails to exploit the local clustering that inverted lists usually exhibit, namely the presence of long subsequences of close identifiers. In this paper we describe a new representation based on partitioning the list into chunks and encoding both the chunks and their endpoints with Elias-Fano, hence forming a two-level data structure. This partitioning enables the encoding to better adapt to the local statistics of the chunk, thus exploiting clustering and improving compression. We present two partition strategies, respectively with fixed and variable-length chunks. For the latter case we introduce a linear-time optimization algorithm which identifies the minimum-space partition up to an arbitrarily small approximation factor. We show that our partitioned Elias-Fano indexes offer significantly better compression than plain Elias-Fano, while preserving their query time eficiency. Furthermore, compared with other state-of-the-art compressed encodings, our indexes exhibit the best compression ratio/query time trade-off. Copyright 2014 ACM.
2014
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
SIGIR '14 Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval (ACM New York, NY, USA ©2014)
SIGIR 2014 - 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
273
282
10
978-1-4503-2257-7
http://dl.acm.org/citation.cfm?doid=2600428.2609615
ACM, Association for computing machinery
New York
STATI UNITI D'AMERICA
Sì, ma tipo non specificato
6-11 July 2014
Gold Coast, QLD, Australia
Compression
Dynamic programming
Inverted indexes
Codice Puma: /cnr.isti/2014-A2-124
1
restricted
Ottaviano G.; Venturini R.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_305255-doc_87118.pdf

solo utenti autorizzati

Descrizione: Partitioned Elias-Fano Indexes
Tipologia: Versione Editoriale (PDF)
Dimensione 727.13 kB
Formato Adobe PDF
727.13 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/272302
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 122
  • ???jsp.display-item.citation.isi??? ND
social impact