CNR Institutional Research Information System

K-Means is a well-known clustering algorithm whose goal is partitioning a number of data points into groups (clusters), so as to minimize dissimilarities of data, measured by some metric, within the same group. Due to its simplicity, K-Means is often used in machine learning unsupervised clustering applications. However, the execution performance of K-Means can easily become a bottleneck when dealing with very large datasets, paired with a great number of clusters, as those encountered in many big data ecosystems. Therefore, many efforts are reported in the literature devoted to a parallelization of K-Means, both on shared-nothing and shared-memory architectures. This paper proposes a novel approach to parallel K-Means on multi/many-core machines, which is based on the Theatre actor system developed in Java. The realization is based on message passing for synchronization among actors (workers) but also offers the possibility of sharing data, in a controlled and safe way, among the actors of the same computing node (theatre). The approach proves effective in delivering a high-performance execution. The paper first provides some background information about the basic K-Means algorithm and the Theatre architecture, then an actor-based parallel version of K-Means is described and experimented with.

Performance of Parallel K-Means Based on Theatre

Franco Cicirelli;Libero Nigro;Francesco Pupo

2022

Abstract

K-Means is a well-known clustering algorithm whose goal is partitioning a number of data points into groups (clusters), so as to minimize dissimilarities of data, measured by some metric, within the same group. Due to its simplicity, K-Means is often used in machine learning unsupervised clustering applications. However, the execution performance of K-Means can easily become a bottleneck when dealing with very large datasets, paired with a great number of clusters, as those encountered in many big data ecosystems. Therefore, many efforts are reported in the literature devoted to a parallelization of K-Means, both on shared-nothing and shared-memory architectures. This paper proposes a novel approach to parallel K-Means on multi/many-core machines, which is based on the Theatre actor system developed in Java. The realization is based on message passing for synchronization among actors (workers) but also offers the possibility of sharing data, in a controlled and safe way, among the actors of the same computing node (theatre). The approach proves effective in delivering a high-performance execution. The paper first provides some background information about the basic K-Means algorithm and the Theatre architecture, then an actor-based parallel version of K-Means is described and experimented with.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				K-Means clustering
Actors
Theatre
Java
High-performance computing
			
	Appare nelle tipologie:
	
				02.01 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

File	Dimensione	Formato
prod_471134-doc_191254.pdf solo utenti autorizzati Descrizione: Performance of Parallel K-Means Based on Theatre Tipologia: Versione Editoriale (PDF) Dimensione 415.92 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	415.92 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/414865

Citazioni

ND

0

0

social impact