This paper continues research aimed at improving the performance accuracy of an evolutionary K-Means algorithm named Population-Based K-Means (PBKM). The PBKM design tries to overcome some limitations of basic K-Means behavior by two steps. In the first step, a population is built with a certain number of centroid candidates, some of which naturally located close to ground truth centroids. In the second step, the population candidate centroids are systematically recombined to achieve a careful clustering solution. Both steps depend on the use of Repeated K-Means together with careful seeding. The paper’s contribution paper consists in developing a new seeding method in the crucial first step of population set-up. Each candidate solution is determined by randomly splitting the dataset into a certain number of segments, clustering each segment by careful seeding, and merging the clusters of the various segments through a pairwise-nearest-neighbor (PNN) strategy. The paper demonstrates the clustering performance of the new PBKM by several simulation experiments carried out on synthetic and real-world datasets.

Clustering Performance of an Evolutionary K-Means Algorithm

Cicirelli F.;
2025

Abstract

This paper continues research aimed at improving the performance accuracy of an evolutionary K-Means algorithm named Population-Based K-Means (PBKM). The PBKM design tries to overcome some limitations of basic K-Means behavior by two steps. In the first step, a population is built with a certain number of centroid candidates, some of which naturally located close to ground truth centroids. In the second step, the population candidate centroids are systematically recombined to achieve a careful clustering solution. Both steps depend on the use of Repeated K-Means together with careful seeding. The paper’s contribution paper consists in developing a new seeding method in the crucial first step of population set-up. Each candidate solution is determined by randomly splitting the dataset into a certain number of segments, clustering each segment by careful seeding, and merging the clusters of the various segments through a pairwise-nearest-neighbor (PNN) strategy. The paper demonstrates the clustering performance of the new PBKM by several simulation experiments carried out on synthetic and real-world datasets.
2025
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
9789819750344
9789819750351
Benchmark datasets
Evolutionary clustering
Java
K-Means
Merging clusters by pairwise-nearest neighbor
Real-world datasets
Seeding methods
Unsupervised clustering
File in questo prodotto:
File Dimensione Formato  
978-981-97-5035-1_27.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 619.06 kB
Formato Adobe PDF
619.06 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/559745
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
social impact