This paper discusses the design and implementation of an efficient caching system aimed to exploit the locality present in the queries submitted to a Web search engine. Previous works showed that there is a significative temporal locality in the queries, and demonstrated that caching query results is a viable strategy to increase search engine throughput. We enhance previous proposals in several directions. First we propose the adoption of a hybrid strategy for caching, where the results of the most frequently submitted queries are maintained in a static cache of fixed size, and only the queries that cannot be satisfied by the static cache compete for the use of a dynamic cache. We experimentally demonstrate the superiority of our hybrid strategy over a purely static or dynamic caching policy by evaluating the hit-rate achieved on three large query logs by varying the size of the cache, the percentage of static cache entries, and the replacement policy used for managing dynamic cache entries. Moreover, we show that search engine query logs also exhibit spatial locality, since users often require subsequent pages of results for the same query. Our caching system also take advantage of this type of locality by exploiting a sort of adaptive prefetching strategy. Finally, differently from other works, we accurately evaluate cost and scalability of our cache implementation.

A hybrid strategy for caching Web search engine results

Fagni T;Palmerini P;Perego R;Silvestri F
2003

Abstract

This paper discusses the design and implementation of an efficient caching system aimed to exploit the locality present in the queries submitted to a Web search engine. Previous works showed that there is a significative temporal locality in the queries, and demonstrated that caching query results is a viable strategy to increase search engine throughput. We enhance previous proposals in several directions. First we propose the adoption of a hybrid strategy for caching, where the results of the most frequently submitted queries are maintained in a static cache of fixed size, and only the queries that cannot be satisfied by the static cache compete for the use of a dynamic cache. We experimentally demonstrate the superiority of our hybrid strategy over a purely static or dynamic caching policy by evaluating the hit-rate achieved on three large query logs by varying the size of the cache, the percentage of static cache entries, and the replacement policy used for managing dynamic cache entries. Moreover, we show that search engine query logs also exhibit spatial locality, since users often require subsequent pages of results for the same query. Our caching system also take advantage of this type of locality by exploiting a sort of adaptive prefetching strategy. Finally, differently from other works, we accurately evaluate cost and scalability of our cache implementation.
2003
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Caching
Search engines
Query log analysis
Performance
File in questo prodotto:
File Dimensione Formato  
prod_90997-doc_130961.pdf

solo utenti autorizzati

Descrizione: A hybrid strategy for caching Web search engine results
Tipologia: Documento in Pre-print
Dimensione 32.8 kB
Formato Adobe PDF
32.8 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/56726
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact