The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the emph{approximate} length of each indexed document to be computed analytically. The paper discusses the implication of the adoption of the proposed technique, and the encouraging results of the experiments conducted with the 2009 TREC Web Track dataset.

Representing document lengths with identifiers

Tonellotto Nicola;Silvestri Fabrizio;Perego Raffaele
2011

Abstract

The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the emph{approximate} length of each indexed document to be computed analytically. The paper discusses the implication of the adoption of the proposed technique, and the encouraging results of the experiments conducted with the 2009 TREC Web Track dataset.
2011
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Information Retrieval
File in questo prodotto:
File Dimensione Formato  
prod_206212-doc_46307.pdf

solo utenti autorizzati

Descrizione: contributo
Tipologia: Versione Editoriale (PDF)
Dimensione 92.29 kB
Formato Adobe PDF
92.29 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/183001
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact