CNR Institutional Research Information System

This paper proposes a variation of the K-Means clustering algorithm, named Population-Based K-Means (PBK-MEANS), which founds its behaviour on careful seeding. The new K-Means algorithm rests on a greedy version of the K-Means++ seeding procedure (g_kmeans++), which proves effective in the search for an accurate clustering solution. PB-K-MEANS first builds a population of candidate solutions by independent runs of K-Means with g_kmeans++. Then the reservoir is used for recombining the stored solutions by Repeated K-Means toward the attainment of a final solution which minimizes the distortion index. PB-KMEANS is currently implemented in Java through parallel streams and lambda expressions. The paper first recalls basic concepts of clustering and of K-Means together with the role of the seeding procedure, then it goes on by describing basic design and implementation issues of PB-K-MEANS. After that, simulation experiments carried out both on synthetic and real-world datasets are reported, confirming good execution performance and careful clustering.

Performance of a K-Means Algorithm driven by careful seeding

Libero Nigro;Franco Cicirelli

2023

Abstract

This paper proposes a variation of the K-Means clustering algorithm, named Population-Based K-Means (PBK-MEANS), which founds its behaviour on careful seeding. The new K-Means algorithm rests on a greedy version of the K-Means++ seeding procedure (g_kmeans++), which proves effective in the search for an accurate clustering solution. PB-K-MEANS first builds a population of candidate solutions by independent runs of K-Means with g_kmeans++. Then the reservoir is used for recombining the stored solutions by Repeated K-Means toward the attainment of a final solution which minimizes the distortion index. PB-KMEANS is currently implemented in Java through parallel streams and lambda expressions. The paper first recalls basic concepts of clustering and of K-Means together with the role of the seeding procedure, then it goes on by describing basic design and implementation issues of PB-K-MEANS. After that, simulation experiments carried out both on synthetic and real-world datasets are reported, confirming good execution performance and careful clustering.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				K-Means Clustering
Seeding Procedure
Greedy K-Means++
Clustering Accuracy Indexes
Java Parallel Streams
Benchmark and Real-World Datasets
Execution Performance.
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_486151-doc_201650.pdf solo utenti autorizzati Descrizione: Performance of a K-Means Algorithm Driven by Careful Seeding Tipologia: Versione Editoriale (PDF) Dimensione 743.44 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	743.44 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/461867

Citazioni

ND

1

ND

social impact