Query Logs collected by a Web Search Engine (WSE) constitute a valuable source of information which can be used in several ways to enhance efficiency and efficacy of the complex process of searching. This paper surveys the results recently achieved by our group in the design of innovative solutions targeting parallel Information Retrieval (IR) systems. Our solutions exploit the knowledge deriving from the patterns of common usage of the system extracted from query logs. Such knowledge has been used: (1), to devise an effective policy for caching WSE query results; (2), to drive the partitioning of the inverted index among the nodes of a termpartitioned, parallel IR system; (3), to perform document partitioning and effective collection selection in a document-partitioned, parallel IR system. The techniques and algorithms used vary from simple statistical analysis, to frequent pattern mining, and document/query co-clustering. The have the common denominator of exploiting past usage information, and of granting remarkable improvements in efficiency or efficacy. The paper briefly describes the proposals and the framework of their application, and reports the results of experiments conducted on large query logs of real WSEs.
On the value of query logs for modern information retrieval
Perego R;Laforenza D;Puppin D
2006
Abstract
Query Logs collected by a Web Search Engine (WSE) constitute a valuable source of information which can be used in several ways to enhance efficiency and efficacy of the complex process of searching. This paper surveys the results recently achieved by our group in the design of innovative solutions targeting parallel Information Retrieval (IR) systems. Our solutions exploit the knowledge deriving from the patterns of common usage of the system extracted from query logs. Such knowledge has been used: (1), to devise an effective policy for caching WSE query results; (2), to drive the partitioning of the inverted index among the nodes of a termpartitioned, parallel IR system; (3), to perform document partitioning and effective collection selection in a document-partitioned, parallel IR system. The techniques and algorithms used vary from simple statistical analysis, to frequent pattern mining, and document/query co-clustering. The have the common denominator of exploiting past usage information, and of granting remarkable improvements in efficiency or efficacy. The paper briefly describes the proposals and the framework of their application, and reports the results of experiments conducted on large query logs of real WSEs.File | Dimensione | Formato | |
---|---|---|---|
prod_138968-doc_130449.pdf
solo utenti autorizzati
Descrizione: On the value of query logs for modern information retrieval
Tipologia:
Versione Editoriale (PDF)
Dimensione
643.88 kB
Formato
Adobe PDF
|
643.88 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.