This paper proposes ParDP, an algorithm and concrete tool for unsupervised clustering, which belongs to the class of density peaks-based clustering methods. Such methods rely on the observation that cluster representative points (centroids) are points of higher local density surrounded by points of lesser density. Candidate centroids, though, are to be far from each other. A key factor of ParDP is adopting a k-Nearest Neighbors (kNN) technique for estimating the density of points. Complete clustering depends on densities and distances among points. ParDP uses principal component analysis to cope with high-dimensional data points. The current implementation relies on Java parallel streams and the built-in lock-free fork/join mechanism, enabling the exploitation of the computing power of commodity multi/many-core machines. This paper demonstrates ParDP’s clustering capabilities by applying it to several benchmark and real-world datasets. ParDP’s operation can either be directed to observe the number of clusters in a dataset or to finalize clustering with an assigned number of clusters. Different internal and external measures can be used to assess the accuracy of a resultant clustering solution.
ParDP: A Parallel Density Peaks-Based Clustering Algorithm
Cicirelli F.
2025
Abstract
This paper proposes ParDP, an algorithm and concrete tool for unsupervised clustering, which belongs to the class of density peaks-based clustering methods. Such methods rely on the observation that cluster representative points (centroids) are points of higher local density surrounded by points of lesser density. Candidate centroids, though, are to be far from each other. A key factor of ParDP is adopting a k-Nearest Neighbors (kNN) technique for estimating the density of points. Complete clustering depends on densities and distances among points. ParDP uses principal component analysis to cope with high-dimensional data points. The current implementation relies on Java parallel streams and the built-in lock-free fork/join mechanism, enabling the exploitation of the computing power of commodity multi/many-core machines. This paper demonstrates ParDP’s clustering capabilities by applying it to several benchmark and real-world datasets. ParDP’s operation can either be directed to observe the number of clusters in a dataset or to finalize clustering with an assigned number of clusters. Different internal and external measures can be used to assess the accuracy of a resultant clustering solution.| File | Dimensione | Formato | |
|---|---|---|---|
|
mathematics-13-01285-v2.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Dominio pubblico
Dimensione
10.44 MB
Formato
Adobe PDF
|
10.44 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


