CNR Institutional Research Information System

Human Action Recognition (HAR) in industrial assembly scenarios presents significant challenges, primarily due to the slight differences in motion patterns across fine-grained actions. In this work, we address the problem of action recognition in assembly tasks by employing skeleton data to represent detailed human movements. To effectively capture the spatial and temporal dependencies among joints, we apply a Transformerbased architecture. We conducted an extensive evaluation by varying the model dimension to analyze its effect on recognition performance. Furthermore, given the high semantic and structural similarity between certain action classes, we propose a class merging strategy that combines highly similar actions into unified categories. This not only simplifies the classification task, but also improves overall recognition performance by reducing ambiguity. Experimental results demonstrate the effectiveness of Transformers for fine-grained action recognition in industrial settings, highlighting the importance of both architectural tuning and label refinement when dealing with closely related human actions.

Transformer-based Human Action Recognition for Fine-Grained Industrial Assembly Tasks

Gallón, Mayra Vanessa Alvear;Patruno, Cosimo;Mata, Gadea;Domínguez, César;Cicirelli, Grazia

2025

Abstract

Human Action Recognition (HAR) in industrial assembly scenarios presents significant challenges, primarily due to the slight differences in motion patterns across fine-grained actions. In this work, we address the problem of action recognition in assembly tasks by employing skeleton data to represent detailed human movements. To effectively capture the spatial and temporal dependencies among joints, we apply a Transformerbased architecture. We conducted an extensive evaluation by varying the model dimension to analyze its effect on recognition performance. Furthermore, given the high semantic and structural similarity between certain action classes, we propose a class merging strategy that combines highly similar actions into unified categories. This not only simplifies the classification task, but also improves overall recognition performance by reducing ambiguity. Experimental results demonstrate the effectiveness of Transformers for fine-grained action recognition in industrial settings, highlighting the importance of both architectural tuning and label refinement when dealing with closely related human actions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di Sistemi e Tecnologie Industriali Intelligenti per il Manifatturiero Avanzato - STIIMA (ex ITIA) Sede Secondaria Bari
			
	Parole chiave
	
				Human Action Recognition, Assembly Task, Transformer Architecture, Reduction of Fine-grained Actions
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
PUBLISHEDpaper.pdf solo utenti autorizzati Tipologia: Documento in Post-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.34 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.34 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/558485

Citazioni

ND

ND

ND

social impact