The K-means algorithm is one of the most popular algorithms in Data Science, and it is aimed to discover similarities among the elements belonging to large datasets, partitioning them in K distinct groups called clusters. The main weakness of this technique is that, in real problems, it is often impossible to define the value of K as input data. Furthermore, the large amount of data used for useful simulations makes impracticable the execution of the algorithm on traditional architectures. In this paper, we address the previous two issues. On the one hand, we propose a method to dynamically define the value of K by optimizing a suitable quality index with special care to the computational cost. On the other hand, to improve the performance and the effectiveness of the algorithm, we propose a strategy for parallel implementation on modern multicore CPUs. (C) 2020 Elsevier Inc. All rights reserved.
Performance enhancement of a dynamic K-means algorithm through a parallel adaptive strategy on multicore CPUs
Romano Diego;
2020
Abstract
The K-means algorithm is one of the most popular algorithms in Data Science, and it is aimed to discover similarities among the elements belonging to large datasets, partitioning them in K distinct groups called clusters. The main weakness of this technique is that, in real problems, it is often impossible to define the value of K as input data. Furthermore, the large amount of data used for useful simulations makes impracticable the execution of the algorithm on traditional architectures. In this paper, we address the previous two issues. On the one hand, we propose a method to dynamically define the value of K by optimizing a suitable quality index with special care to the computational cost. On the other hand, to improve the performance and the effectiveness of the algorithm, we propose a strategy for parallel implementation on modern multicore CPUs. (C) 2020 Elsevier Inc. All rights reserved.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.