Human Action Recognition (HAR) in industrial assembly scenarios presents significant challenges, primarily due to the slight differences in motion patterns across fine-grained actions. In this work, we address the problem of action recognition in assembly tasks by employing skeleton data to represent detailed human movements. To effectively capture the spatial and temporal dependencies among joints, we apply a Transformerbased architecture. We conducted an extensive evaluation by varying the model dimension to analyze its effect on recognition performance. Furthermore, given the high semantic and structural similarity between certain action classes, we propose a class merging strategy that combines highly similar actions into unified categories. This not only simplifies the classification task, but also improves overall recognition performance by reducing ambiguity. Experimental results demonstrate the effectiveness of Transformers for fine-grained action recognition in industrial settings, highlighting the importance of both architectural tuning and label refinement when dealing with closely related human actions.
Transformer-based Human Action Recognition for Fine-Grained Industrial Assembly Tasks
Patruno, Cosimo
;Cicirelli, Grazia
2025
Abstract
Human Action Recognition (HAR) in industrial assembly scenarios presents significant challenges, primarily due to the slight differences in motion patterns across fine-grained actions. In this work, we address the problem of action recognition in assembly tasks by employing skeleton data to represent detailed human movements. To effectively capture the spatial and temporal dependencies among joints, we apply a Transformerbased architecture. We conducted an extensive evaluation by varying the model dimension to analyze its effect on recognition performance. Furthermore, given the high semantic and structural similarity between certain action classes, we propose a class merging strategy that combines highly similar actions into unified categories. This not only simplifies the classification task, but also improves overall recognition performance by reducing ambiguity. Experimental results demonstrate the effectiveness of Transformers for fine-grained action recognition in industrial settings, highlighting the importance of both architectural tuning and label refinement when dealing with closely related human actions.| File | Dimensione | Formato | |
|---|---|---|---|
|
PUBLISHEDpaper.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.34 MB
Formato
Adobe PDF
|
1.34 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


