To address the rapid growth of the Internet, modern Web search engines have to adopt distributed organizations, where the collection of indexed documents is partitioned among several servers, and query answering is performed as a parallel and distributed task. Collection selection can be a way to reduce the overall computing load, by finding a trade-off between the quality of results retrieved and the cost of solving queries. In this paper, we analyze the relationship between the caching subsystem and the collection selection strategy, by exploring the design-space of this combined approach. In particular, we propose a novel caching policy able to incrementally refine the effectiveness of the results returned for each subsequent cache hit. The combination of collection selection and incremental caching strategies allows our system to retrieve two thirds of the top-ranked results returned by a baseline centralized index, with only one fifth of the computing workload.

Incremental caching for collection selection architectures

Perego R;Silvestri F;Puppin D;
2007

Abstract

To address the rapid growth of the Internet, modern Web search engines have to adopt distributed organizations, where the collection of indexed documents is partitioned among several servers, and query answering is performed as a parallel and distributed task. Collection selection can be a way to reduce the overall computing load, by finding a trade-off between the quality of results retrieved and the cost of solving queries. In this paper, we analyze the relationship between the caching subsystem and the collection selection strategy, by exploring the design-space of this combined approach. In particular, we propose a novel caching policy able to incrementally refine the effectiveness of the results returned for each subsequent cache hit. The combination of collection selection and incremental caching strategies allows our system to retrieve two thirds of the top-ranked results returned by a baseline centralized index, with only one fifth of the computing workload.
2007
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Web search engines
Caching
Efficiency
File in questo prodotto:
File Dimensione Formato  
prod_91637-doc_131387.pdf

solo utenti autorizzati

Descrizione: Incremental caching for collection selection architectures
Tipologia: Versione Editoriale (PDF)
Dimensione 256.44 kB
Formato Adobe PDF
256.44 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/102597
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact