CNR Institutional Research Information System

This paper proposes ParDP, an algorithm and concrete tool for unsupervised clustering, which belongs to the class of density peaks-based clustering methods. Such methods rely on the observation that cluster representative points (centroids) are points of higher local density surrounded by points of lesser density. Candidate centroids, though, are to be far from each other. A key factor of ParDP is adopting a k-Nearest Neighbors (kNN) technique for estimating the density of points. Complete clustering depends on densities and distances among points. ParDP uses principal component analysis to cope with high-dimensional data points. The current implementation relies on Java parallel streams and the built-in lock-free fork/join mechanism, enabling the exploitation of the computing power of commodity multi/many-core machines. This paper demonstrates ParDP’s clustering capabilities by applying it to several benchmark and real-world datasets. ParDP’s operation can either be directed to observe the number of clusters in a dataset or to finalize clustering with an assigned number of clusters. Different internal and external measures can be used to assess the accuracy of a resultant clustering solution.

ParDP: A Parallel Density Peaks-Based Clustering Algorithm

Nigro L.;Cicirelli F.

2025

Abstract

This paper proposes ParDP, an algorithm and concrete tool for unsupervised clustering, which belongs to the class of density peaks-based clustering methods. Such methods rely on the observation that cluster representative points (centroids) are points of higher local density surrounded by points of lesser density. Candidate centroids, though, are to be far from each other. A key factor of ParDP is adopting a k-Nearest Neighbors (kNN) technique for estimating the density of points. Complete clustering depends on densities and distances among points. ParDP uses principal component analysis to cope with high-dimensional data points. The current implementation relies on Java parallel streams and the built-in lock-free fork/join mechanism, enabling the exploitation of the computing power of commodity multi/many-core machines. This paper demonstrates ParDP’s clustering capabilities by applying it to several benchmark and real-world datasets. ParDP’s operation can either be directed to observe the number of clusters in a dataset or to finalize clustering with an assigned number of clusters. Different internal and external measures can be used to assess the accuracy of a resultant clustering solution.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				benchmark and real-world datasets
clustering accuracy measures
density peaks-based clustering
Java
k-nearest neighbors
parallel programming
principal component analysis
unsupervised clustering
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
mathematics-13-01285-v2.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Dominio pubblico Dimensione 10.44 MB Formato Adobe PDF Visualizza/Apri	10.44 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/559746

Citazioni

ND

0

0

social impact