WebCat is a versatile system which reorganizes search results into a partition of homogeneous document clusters using Data Mining techniques. The purpose is to help users to easily browse through the set of retrieved documents, by focusing on clusters whose characterizing keywords are directly pertinent to the search. WebCat submits a query specified by the user to the Google search engine, and retrieves a large number of snippets, i.e., answers. Then, snippets are modelled as sets of (clean, stemmed) terms and are partitioned into clusters by means of the Transactional K-means algorithm. Clusters are then presented to the users by means of their centroids (i.e., sets of terms which well represent the content of each cluster) which can be used as a fast access method to the answers contained in each cluster. The overall system is computationally light, very fast, and can be run on the client side as a Internet Explorer toolbar (similar to the Google Toolbar).

WebCat

Giannotti F;Nanni M;
2003

Abstract

WebCat is a versatile system which reorganizes search results into a partition of homogeneous document clusters using Data Mining techniques. The purpose is to help users to easily browse through the set of retrieved documents, by focusing on clusters whose characterizing keywords are directly pertinent to the search. WebCat submits a query specified by the user to the Google search engine, and retrieves a large number of snippets, i.e., answers. Then, snippets are modelled as sets of (clean, stemmed) terms and are partitioned into clusters by means of the Transactional K-means algorithm. Clusters are then presented to the users by means of their centroids (i.e., sets of terms which well represent the content of each cluster) which can be used as a fast access method to the answers contained in each cluster. The overall system is computationally light, very fast, and can be run on the client side as a Internet Explorer toolbar (similar to the Google Toolbar).
2003
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Web mining
Clustering
Search engines
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/180200
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact