K-means is a widespread clustering algorithm characterized by its simplicity and efficiency. K-means behavior, though, strongly depends on the initialization of the cluster centers (centroids) and tends to be stuck in a local suboptimal solution. Many techniques have been devised to overcome these problems, e.g., by a global strategy to reduce locality of centroid adjustments or by using density peaks for centroids initialization. This paper proposes an improved version of the K-means—DPKM-based on the concepts of density peaks. Density peaks have been proved to be a key for solving clustering problems where not-spherical regions with complex point distributions are involved. Centroids are actually selected from density peaks by using a technique borrowed from the DK-means++ initialization method, which ensures centroids are not only points with higher density, but also far-away from each other. DPKM is implemented in Java using parallel streams and lambda expressions which are capable of delivering good execution times on large datasets on multicore machines with shared memory. The efficiency and reliability of DPKM are demonstrated by applying it to challenging synthetic datasets often used as benchmarks for clustering methods.
Fast and Accurate K-means Clustering Based on Density Peaks
Cicirelli F.
2023
Abstract
K-means is a widespread clustering algorithm characterized by its simplicity and efficiency. K-means behavior, though, strongly depends on the initialization of the cluster centers (centroids) and tends to be stuck in a local suboptimal solution. Many techniques have been devised to overcome these problems, e.g., by a global strategy to reduce locality of centroid adjustments or by using density peaks for centroids initialization. This paper proposes an improved version of the K-means—DPKM-based on the concepts of density peaks. Density peaks have been proved to be a key for solving clustering problems where not-spherical regions with complex point distributions are involved. Centroids are actually selected from density peaks by using a technique borrowed from the DK-means++ initialization method, which ensures centroids are not only points with higher density, but also far-away from each other. DPKM is implemented in Java using parallel streams and lambda expressions which are capable of delivering good execution times on large datasets on multicore machines with shared memory. The efficiency and reliability of DPKM are demonstrated by applying it to challenging synthetic datasets often used as benchmarks for clustering methods.| File | Dimensione | Formato | |
|---|---|---|---|
|
978-981-99-3250-4_59.pdf
solo utenti autorizzati
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
474.55 kB
Formato
Adobe PDF
|
474.55 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


