This paper overviews soft clustering algorithms applied in the context of information retrieval (IR). First, a motivation of the utility of soft clustering approaches in IR is discussed. Then, an outline of the two main flat soft approaches, namely probabilistic clustering and fuzzy clustering, is described. Specifically, the expectation maximization and fuzzy c-means algorithms are introduced, and some of their extensions defined to overcome their main drawbacks when applied for organizing large document collections. Finally, soft hierarchical clustering algorithms designed for generating taxonomies of documents are introduced. C (C) 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 138-146 DOI: 10.1002/widm.3
Soft clustering for information retrieval applications
Bordogna Gloria;
2011
Abstract
This paper overviews soft clustering algorithms applied in the context of information retrieval (IR). First, a motivation of the utility of soft clustering approaches in IR is discussed. Then, an outline of the two main flat soft approaches, namely probabilistic clustering and fuzzy clustering, is described. Specifically, the expectation maximization and fuzzy c-means algorithms are introduced, and some of their extensions defined to overcome their main drawbacks when applied for organizing large document collections. Finally, soft hierarchical clustering algorithms designed for generating taxonomies of documents are introduced. C (C) 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 138-146 DOI: 10.1002/widm.3I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.