CNR Institutional Research Information System

Query Logs collected by a Web Search Engine (WSE) constitute a valuable source of information which can be used in several ways to enhance efficiency and efficacy of the complex process of searching. This paper surveys the results recently achieved by our group in the design of innovative solutions targeting parallel Information Retrieval (IR) systems. Our solutions exploit the knowledge deriving from the patterns of common usage of the system extracted from query logs. Such knowledge has been used: (1), to devise an effective policy for caching WSE query results; (2), to drive the partitioning of the inverted index among the nodes of a termpartitioned, parallel IR system; (3), to perform document partitioning and effective collection selection in a document-partitioned, parallel IR system. The techniques and algorithms used vary from simple statistical analysis, to frequent pattern mining, and document/query co-clustering. The have the common denominator of exploiting past usage information, and of granting remarkable improvements in efficiency or efficacy. The paper briefly describes the proposals and the framework of their application, and reports the results of experiments conducted on large query logs of real WSEs.

On the value of query logs for modern information retrieval

Perego R;Orlando S;Lucchese C;Silvestri F;Laforenza D;Puppin D

2006

Abstract

Query Logs collected by a Web Search Engine (WSE) constitute a valuable source of information which can be used in several ways to enhance efficiency and efficacy of the complex process of searching. This paper surveys the results recently achieved by our group in the design of innovative solutions targeting parallel Information Retrieval (IR) systems. Our solutions exploit the knowledge deriving from the patterns of common usage of the system extracted from query logs. Such knowledge has been used: (1), to devise an effective policy for caching WSE query results; (2), to drive the partitioning of the inverted index among the nodes of a termpartitioned, parallel IR system; (3), to perform document partitioning and effective collection selection in a document-partitioned, parallel IR system. The techniques and algorithms used vary from simple statistical analysis, to frequent pattern mining, and document/query co-clustering. The have the common denominator of exploiting past usage information, and of granting remarkable improvements in efficiency or efficacy. The paper briefly describes the proposals and the framework of their application, and reports the results of experiments conducted on large query logs of real WSEs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2006
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				8876990437
			
	Parole chiave
	
				Query log analysis
			
	Appare nelle tipologie:
	
				02.01 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

File	Dimensione	Formato
prod_138968-doc_130449.pdf solo utenti autorizzati Descrizione: On the value of query logs for modern information retrieval Tipologia: Versione Editoriale (PDF) Dimensione 643.88 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	643.88 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/97831

Citazioni

ND

ND

ND

social impact