Human Action Recognition (HAR) in industrial assembly scenarios presents significant challenges, primarily due to the slight differences in motion patterns across fine-grained actions. In this work, we address the problem of action recognition in assembly tasks by employing skeleton data to represent detailed human movements. To effectively capture the spatial and temporal dependencies among joints, we apply a Transformerbased architecture. We conducted an extensive evaluation by varying the model dimension to analyze its effect on recognition performance. Furthermore, given the high semantic and structural similarity between certain action classes, we propose a class merging strategy that combines highly similar actions into unified categories. This not only simplifies the classification task, but also improves overall recognition performance by reducing ambiguity. Experimental results demonstrate the effectiveness of Transformers for fine-grained action recognition in industrial settings, highlighting the importance of both architectural tuning and label refinement when dealing with closely related human actions.

Transformer-based Human Action Recognition for Fine-Grained Industrial Assembly Tasks

Patruno, Cosimo
;
Cicirelli, Grazia
2025

Abstract

Human Action Recognition (HAR) in industrial assembly scenarios presents significant challenges, primarily due to the slight differences in motion patterns across fine-grained actions. In this work, we address the problem of action recognition in assembly tasks by employing skeleton data to represent detailed human movements. To effectively capture the spatial and temporal dependencies among joints, we apply a Transformerbased architecture. We conducted an extensive evaluation by varying the model dimension to analyze its effect on recognition performance. Furthermore, given the high semantic and structural similarity between certain action classes, we propose a class merging strategy that combines highly similar actions into unified categories. This not only simplifies the classification task, but also improves overall recognition performance by reducing ambiguity. Experimental results demonstrate the effectiveness of Transformers for fine-grained action recognition in industrial settings, highlighting the importance of both architectural tuning and label refinement when dealing with closely related human actions.
2025
Istituto di Sistemi e Tecnologie Industriali Intelligenti per il Manifatturiero Avanzato - STIIMA (ex ITIA) Sede Secondaria Bari
Human Action Recognition, Assembly Task, Transformer Architecture, Reduction of Fine-grained Actions
File in questo prodotto:
File Dimensione Formato  
PUBLISHEDpaper.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/558485
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact