Clustering is one of the most important unsupervised learning problems and it deals with finding a structure in a collection of unlabeled data; however, different clustering algorithms applied to the same data-set produce different solutions. In many applications the problem of multiple solutions becomes crucial and providing a limited group of good clusterings is often more desirable than a single solution. In this work we propose the Least Square Consensus clustering that allows a user to extrapolate a small number of different clustering solutions from an initial (large) set of solutions obtained by applying any clustering algorithm to a given data-set. Two different implementations are presented. In both cases, each consensus is accomplished with a measure of quality defined in terms of Least Square error and a graphical visualization is provided in order to make immediately interpretable the result. Numerical experiments are carried out on both synthetic and real data-sets.

Multiple Clustering Solutions Analysis Through Lest-Square Consensus Algorithms

Angelini C;De Feis I;
2010

Abstract

Clustering is one of the most important unsupervised learning problems and it deals with finding a structure in a collection of unlabeled data; however, different clustering algorithms applied to the same data-set produce different solutions. In many applications the problem of multiple solutions becomes crucial and providing a limited group of good clusterings is often more desirable than a single solution. In this work we propose the Least Square Consensus clustering that allows a user to extrapolate a small number of different clustering solutions from an initial (large) set of solutions obtained by applying any clustering algorithm to a given data-set. Two different implementations are presented. In both cases, each consensus is accomplished with a measure of quality defined in terms of Least Square error and a graphical visualization is provided in order to make immediately interpretable the result. Numerical experiments are carried out on both synthetic and real data-sets.
2010
Istituto Applicazioni del Calcolo ''Mauro Picone''
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/32419
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact