CNR Institutional Research Information System

This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the userwith a compact guide to assessing the relevance of each cluster to her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-point-first algorithm for metric k-center clustering. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure.We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from theOpen Directory Project hierarchy. According to two widely accepted external metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms.

Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution

Geraci F;Pellegrini M;Sebastiani F;Maggini M

2006

Abstract

This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the userwith a compact guide to assessing the relevance of each cluster to her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-point-first algorithm for metric k-center clustering. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure.We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from theOpen Directory Project hierarchy. According to two widely accepted external metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2006
			
	Strutture organizzative
	
				Istituto di informatica e telematica - IIT
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice Web of Science
	
				WOS:000217628000002
			
	Volume
	
				4209
			
	Da pagina
	
				25
			
	A pagina
	
				36
			
	Codice DOI
	
				https://dx.doi.org/10.1007/11880561_3
			
	Codice Scopus
	
				2-s2.0-33750359861
			
	Parole chiave
	
				clustering
web snippets
			
	Numero autori
	
				2
			
	Tipologia
	
				info:eu-repo/semantics/article
			
	Tipologia Login Miur
	
				262
			
	Tutti gli autori
	
						Geraci F.;  Pellegrini M.;  Sebastiani F.;  Maggini M.
					
	Tipologia
	
				01 Contributo su Rivista::01.01 Articolo in rivista
			
	Fulltext
	
				restricted
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_29939-doc_79596.pdf solo utenti autorizzati Descrizione: Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution Tipologia: Versione Editoriale (PDF) Dimensione 199.3 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	199.3 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/46187

Citazioni

ND

48

12

social impact