WebCat is a versatile system which reorganizes search results into a partition of homogeneous document clusters using Data Mining techniques. The purpose is to help users to easily browse through the set of retrieved documents, by focusing on clusters whose characterizing keywords are directly pertinent to the search. WebCat submits a query specified by the user to the Google search engine, and retrieves a large number of snippets, i.e., answers. Then, snippets are modelled as sets of (clean, stemmed) terms and are partitioned into clusters by means of the Transactional K-means algorithm. Clusters are then presented to the users by means of their centroids (i.e., sets of terms which well represent the content of each cluster) which can be used as a fast access method to the answers contained in each cluster. The overall system is computationally light, very fast, and can be run on the client side as a Internet Explorer toolbar (similar to the Google Toolbar).
WebCat
Giannotti F;Nanni M;
2003
Abstract
WebCat is a versatile system which reorganizes search results into a partition of homogeneous document clusters using Data Mining techniques. The purpose is to help users to easily browse through the set of retrieved documents, by focusing on clusters whose characterizing keywords are directly pertinent to the search. WebCat submits a query specified by the user to the Google search engine, and retrieves a large number of snippets, i.e., answers. Then, snippets are modelled as sets of (clean, stemmed) terms and are partitioned into clusters by means of the Transactional K-means algorithm. Clusters are then presented to the users by means of their centroids (i.e., sets of terms which well represent the content of each cluster) which can be used as a fast access method to the answers contained in each cluster. The overall system is computationally light, very fast, and can be run on the client side as a Internet Explorer toolbar (similar to the Google Toolbar).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


