Human Action Recognition (HAR) plays a vital role in manufacturing assembly tasks, addressing key areas such as worker safety, operational support, production optimization, employee training, and facilitating human-robot collaboration. This paper introduces a skeleton-based action recognition approach based on a CNN deep neural network architecture. Joint-to-joint distances are used to represent human movements during assembly tasks, enabling the model to capture intricate motion patterns. The primary focus of this work is structuring the input data in various ways to analyze how these variations influence the network performance. Studying the spatial configurations of input data for human action recognition in an assembly task is an insightful and challenging research topic. In assembly tasks, the high similarity between actions and the operator-specific execution variations make distinguishing actions more complex. This work investigates how the arrangement of input data impacts model accuracy. In particular, two input data configurations are analyzed: onechannel and multi-channel types. The assembly actions are classified using a CNN-based architecture. So, the different data configurations directly influence the type of CNN applied, which can be 2D or 3D. The proposed approach is evaluated on the publicly available HA4M dataset. The obtained results showed that the proposed data structure greatly influences the model performance measurement.
Analysis of Input Data Configurations in CNN-based Human Action Recognition for Assembly Task
Cosimo Patruno
;Grazia Cicirelli;Laura Romeo;Tiziana D'Orazio
In corso di stampa
Abstract
Human Action Recognition (HAR) plays a vital role in manufacturing assembly tasks, addressing key areas such as worker safety, operational support, production optimization, employee training, and facilitating human-robot collaboration. This paper introduces a skeleton-based action recognition approach based on a CNN deep neural network architecture. Joint-to-joint distances are used to represent human movements during assembly tasks, enabling the model to capture intricate motion patterns. The primary focus of this work is structuring the input data in various ways to analyze how these variations influence the network performance. Studying the spatial configurations of input data for human action recognition in an assembly task is an insightful and challenging research topic. In assembly tasks, the high similarity between actions and the operator-specific execution variations make distinguishing actions more complex. This work investigates how the arrangement of input data impacts model accuracy. In particular, two input data configurations are analyzed: onechannel and multi-channel types. The assembly actions are classified using a CNN-based architecture. So, the different data configurations directly influence the type of CNN applied, which can be 2D or 3D. The proposed approach is evaluated on the publicly available HA4M dataset. The obtained results showed that the proposed data structure greatly influences the model performance measurement.| File | Dimensione | Formato | |
|---|---|---|---|
|
PaperSubmitted.pdf
solo utenti autorizzati
Tipologia:
Documento in Pre-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
3.73 MB
Formato
Adobe PDF
|
3.73 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


