K-Means is a well-known clustering algorithm whose goal is partitioning a number of data points into groups (clusters), so as to minimize dissimilarities of data, measured by some metric, within the same group. Due to its simplicity, K-Means is often used in machine learning unsupervised clustering applications. However, the execution performance of K-Means can easily become a bottleneck when dealing with very large datasets, paired with a great number of clusters, as those encountered in many big data ecosystems. Therefore, many efforts are reported in the literature devoted to a parallelization of K-Means, both on shared-nothing and shared-memory architectures. This paper proposes a novel approach to parallel K-Means on multi/many-core machines, which is based on the Theatre actor system developed in Java. The realization is based on message passing for synchronization among actors (workers) but also offers the possibility of sharing data, in a controlled and safe way, among the actors of the same computing node (theatre). The approach proves effective in delivering a high-performance execution. The paper first provides some background information about the basic K-Means algorithm and the Theatre architecture, then an actor-based parallel version of K-Means is described and experimented with.
Performance of Parallel K-Means Based on Theatre
Franco Cicirelli;
2022
Abstract
K-Means is a well-known clustering algorithm whose goal is partitioning a number of data points into groups (clusters), so as to minimize dissimilarities of data, measured by some metric, within the same group. Due to its simplicity, K-Means is often used in machine learning unsupervised clustering applications. However, the execution performance of K-Means can easily become a bottleneck when dealing with very large datasets, paired with a great number of clusters, as those encountered in many big data ecosystems. Therefore, many efforts are reported in the literature devoted to a parallelization of K-Means, both on shared-nothing and shared-memory architectures. This paper proposes a novel approach to parallel K-Means on multi/many-core machines, which is based on the Theatre actor system developed in Java. The realization is based on message passing for synchronization among actors (workers) but also offers the possibility of sharing data, in a controlled and safe way, among the actors of the same computing node (theatre). The approach proves effective in delivering a high-performance execution. The paper first provides some background information about the basic K-Means algorithm and the Theatre architecture, then an actor-based parallel version of K-Means is described and experimented with.| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_471134-doc_191254.pdf
solo utenti autorizzati
Descrizione: Performance of Parallel K-Means Based on Theatre
Tipologia:
Versione Editoriale (PDF)
Dimensione
415.92 kB
Formato
Adobe PDF
|
415.92 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


