CNR Institutional Research Information System

This paper continues research aimed at improving the performance accuracy of an evolutionary K-Means algorithm named Population-Based K-Means (PBKM). The PBKM design tries to overcome some limitations of basic K-Means behavior by two steps. In the first step, a population is built with a certain number of centroid candidates, some of which naturally located close to ground truth centroids. In the second step, the population candidate centroids are systematically recombined to achieve a careful clustering solution. Both steps depend on the use of Repeated K-Means together with careful seeding. The paper’s contribution paper consists in developing a new seeding method in the crucial first step of population set-up. Each candidate solution is determined by randomly splitting the dataset into a certain number of segments, clustering each segment by careful seeding, and merging the clusters of the various segments through a pairwise-nearest-neighbor (PNN) strategy. The paper demonstrates the clustering performance of the new PBKM by several simulation experiments carried out on synthetic and real-world datasets.

Clustering Performance of an Evolutionary K-Means Algorithm

Nigro L.;Cicirelli F.;Pupo F.

2025

Abstract

This paper continues research aimed at improving the performance accuracy of an evolutionary K-Means algorithm named Population-Based K-Means (PBKM). The PBKM design tries to overcome some limitations of basic K-Means behavior by two steps. In the first step, a population is built with a certain number of centroid candidates, some of which naturally located close to ground truth centroids. In the second step, the population candidate centroids are systematically recombined to achieve a careful clustering solution. Both steps depend on the use of Repeated K-Means together with careful seeding. The paper’s contribution paper consists in developing a new seeding method in the crucial first step of population set-up. Each candidate solution is determined by randomly splitting the dataset into a certain number of segments, clustering each segment by careful seeding, and merging the clusters of the various segments through a pairwise-nearest-neighbor (PNN) strategy. The paper demonstrates the clustering performance of the new PBKM by several simulation experiments carried out on synthetic and real-world datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Codice ISBN
	
				9789819750344
9789819750351
			
	Parole chiave
	
				Benchmark datasets
Evolutionary clustering
Java
K-Means
Merging clusters by pairwise-nearest neighbor
Real-world datasets
Seeding methods
Unsupervised clustering
			
	Appare nelle tipologie:
	
				02.01 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

File	Dimensione	Formato
978-981-97-5035-1_27.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 619.06 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	619.06 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/559745

Citazioni

ND

3

0

social impact