This paper discuses effciency and effectivenes issues in caching the results of queries submitted to a Web Search Engine (WSE). We propose SDC, a new caching strategy aimed to effciently exploit the temporal and spatial locality present in the stream of processed queries. SDC stores the results of the most frequently submitted queries in a static, read-only portion of the cache, while the queries that cannot be satisfied by the static portion compete for the remaining entries of the cache according to a given replacement policy. Moreover, we improved the hit-ratio of SDC by using a speculative prefetching strategy, which anticipates future requests by introducing a limited overhead over the backend WSE. We experimentally demonstrated the superiority of SDC over purely static and dynamic policies by measuring the hit-ratio achieved on three large query logs by varying the cache parameters and the replacement policy used for managing the dynamic part of the cache. Finally, we deployed and measured the throughput achieved by a concurrent version of our caching system. Our tests showed how the SDC cache can be efficiently exploited by several threads that concurrently serve the queries of different users.

Boosting the Performance of Web Search Engines: Caching and Prefetching Query Results by Exploiting Historical Usage Data

Fagni T;Silvestri F;Perego R
2004

Abstract

This paper discuses effciency and effectivenes issues in caching the results of queries submitted to a Web Search Engine (WSE). We propose SDC, a new caching strategy aimed to effciently exploit the temporal and spatial locality present in the stream of processed queries. SDC stores the results of the most frequently submitted queries in a static, read-only portion of the cache, while the queries that cannot be satisfied by the static portion compete for the remaining entries of the cache according to a given replacement policy. Moreover, we improved the hit-ratio of SDC by using a speculative prefetching strategy, which anticipates future requests by introducing a limited overhead over the backend WSE. We experimentally demonstrated the superiority of SDC over purely static and dynamic policies by measuring the hit-ratio achieved on three large query logs by varying the cache parameters and the replacement policy used for managing the dynamic part of the cache. Finally, we deployed and measured the throughput achieved by a concurrent version of our caching system. Our tests showed how the SDC cache can be efficiently exploited by several threads that concurrently serve the queries of different users.
2004
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Caching
Search engines
Query log analysis
Performance
File in questo prodotto:
File Dimensione Formato  
prod_160663-doc_125325.pdf

accesso aperto

Descrizione: Boosting the performance of Web
Dimensione 1.76 MB
Formato Adobe PDF
1.76 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/152172
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact