CNR Institutional Research Information System

K-means is a widespread clustering algorithm characterized by its simplicity and efficiency. K-means behavior, though, strongly depends on the initialization of the cluster centers (centroids) and tends to be stuck in a local suboptimal solution. Many techniques have been devised to overcome these problems, e.g., by a global strategy to reduce locality of centroid adjustments or by using density peaks for centroids initialization. This paper proposes an improved version of the K-means—DPKM-based on the concepts of density peaks. Density peaks have been proved to be a key for solving clustering problems where not-spherical regions with complex point distributions are involved. Centroids are actually selected from density peaks by using a technique borrowed from the DK-means++ initialization method, which ensures centroids are not only points with higher density, but also far-away from each other. DPKM is implemented in Java using parallel streams and lambda expressions which are capable of delivering good execution times on large datasets on multicore machines with shared memory. The efficiency and reliability of DPKM are demonstrated by applying it to challenging synthetic datasets often used as benchmarks for clustering methods.

Fast and Accurate K-means Clustering Based on Density Peaks

Nigro L.;Cicirelli F.

2023

Abstract

K-means is a widespread clustering algorithm characterized by its simplicity and efficiency. K-means behavior, though, strongly depends on the initialization of the cluster centers (centroids) and tends to be stuck in a local suboptimal solution. Many techniques have been devised to overcome these problems, e.g., by a global strategy to reduce locality of centroid adjustments or by using density peaks for centroids initialization. This paper proposes an improved version of the K-means—DPKM-based on the concepts of density peaks. Density peaks have been proved to be a key for solving clustering problems where not-spherical regions with complex point distributions are involved. Centroids are actually selected from density peaks by using a technique borrowed from the DK-means++ initialization method, which ensures centroids are not only points with higher density, but also far-away from each other. DPKM is implemented in Java using parallel streams and lambda expressions which are capable of delivering good execution times on large datasets on multicore machines with shared memory. The efficiency and reliability of DPKM are demonstrated by applying it to challenging synthetic datasets often used as benchmarks for clustering methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Codice ISBN
	
				9789819932498
9789819932504
			
	Parole chiave
	
				Centroids initialization
Clustering
Density peaks
DK-means++
Java
K-means
Lambda expressions
Multicore machines
Parallel streams
			
	Appare nelle tipologie:
	
				02.01 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

File	Dimensione	Formato
978-981-99-3250-4_59.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 474.55 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	474.55 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/559747

Citazioni

ND

0

ND

social impact