Large-scale Parallel Web Search Engines (WSEs) needs to adopt a strategy for partitioning the inverted index among a set of parallel server nodes. In this paper we are interested in devising an effective term-partitioning strategy, according to which the global vocabulary of terms and the associated inverted lists are split into disjoint subsets, and assigned to distinct servers. Due to the workload imbalance caused by the skewed distribution of terms in user queries, finding an effective partitioning strategy is considered a very complex task. In this paper we first formally introduce Term Partitioning as a new optimization problem. Then we show how the knowledge mined from past WSE query logs can be profitably used to discover good solutions of this problem. Finally, we report many results to show that we are able to effectively reduce both the average number of servers activated per each query, along with the workload imbalance. Experiments are conducted on large query logs of real WSEs.

Mining query logs to optimize index partitioning in parallel Web search engines

Perego R;Lucchese C;Orlando S;Silvestri F
2007

Abstract

Large-scale Parallel Web Search Engines (WSEs) needs to adopt a strategy for partitioning the inverted index among a set of parallel server nodes. In this paper we are interested in devising an effective term-partitioning strategy, according to which the global vocabulary of terms and the associated inverted lists are split into disjoint subsets, and assigned to distinct servers. Due to the workload imbalance caused by the skewed distribution of terms in user queries, finding an effective partitioning strategy is considered a very complex task. In this paper we first formally introduce Term Partitioning as a new optimization problem. Then we show how the knowledge mined from past WSE query logs can be profitably used to discover good solutions of this problem. Finally, we report many results to show that we are able to effectively reduce both the average number of servers activated per each query, along with the workload imbalance. Experiments are conducted on large query logs of real WSEs.
2007
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
H.3 Information Storage and Retrieval
Data mining
Web search
Information retrieval
File in questo prodotto:
File Dimensione Formato  
prod_91655-doc_131456.pdf

solo utenti autorizzati

Descrizione: Mining query logs to optimize index partitioning in parallel Web search engines
Tipologia: Versione Editoriale (PDF)
Dimensione 672.62 kB
Formato Adobe PDF
672.62 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/102615
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact