Industrial robots have become prevalent in manufacturing due to their advantages of accuracy, speed, and reduced operator fatigue. Nevertheless, human operators playa crucial role in primary production lines. This study focuses on the temporal segmentation of human actions, aiming to identify the physical and cognitive behavior of operators working alongside collaborative robots. While existing literature explores temporal action segmentation datasets, there is a lack of evaluation for manufacturing tasks. This work assesses six state-of-theart action segmentation models using the Human Action Multi-Modal Monitoring in Manufacturing (HA4M) dataset, where subjects assemble an industrial object in realistic manufacturing scenarios. By employing Cross- Subject and Cross-Location approaches, the study not only demonstrates the effectiveness of these models in industrial settings but also introduces anew benchmark for evaluating generalization across different subjects and locations. The evaluation further includes new videos in simulated industrial locations, assessed with both fully and semi-supervised learning approaches. The findings reveal that the Multi-Stage Temporal Convolutional Network ++ (MS-TCN++) and the Action Segmentation Transformer (ASFormer) architectures exhibit high performance in supervised and semi-supervised learning settings, also using new data, particularly when trained with Skeletal features, advancing the capabilities of temporal action segmentation in real-world manufacturing environments. This research lays the foundation for addressing video activity understanding challenges in manufacturing and presents opportunities for future investigations.

Multi-modal temporal action segmentation for manufacturing scenarios

Romeo L.
Primo
Writing – Original Draft Preparation
;
Marani R.
Secondo
Supervision
;
2025

Abstract

Industrial robots have become prevalent in manufacturing due to their advantages of accuracy, speed, and reduced operator fatigue. Nevertheless, human operators playa crucial role in primary production lines. This study focuses on the temporal segmentation of human actions, aiming to identify the physical and cognitive behavior of operators working alongside collaborative robots. While existing literature explores temporal action segmentation datasets, there is a lack of evaluation for manufacturing tasks. This work assesses six state-of-theart action segmentation models using the Human Action Multi-Modal Monitoring in Manufacturing (HA4M) dataset, where subjects assemble an industrial object in realistic manufacturing scenarios. By employing Cross- Subject and Cross-Location approaches, the study not only demonstrates the effectiveness of these models in industrial settings but also introduces anew benchmark for evaluating generalization across different subjects and locations. The evaluation further includes new videos in simulated industrial locations, assessed with both fully and semi-supervised learning approaches. The findings reveal that the Multi-Stage Temporal Convolutional Network ++ (MS-TCN++) and the Action Segmentation Transformer (ASFormer) architectures exhibit high performance in supervised and semi-supervised learning settings, also using new data, particularly when trained with Skeletal features, advancing the capabilities of temporal action segmentation in real-world manufacturing environments. This research lays the foundation for addressing video activity understanding challenges in manufacturing and presents opportunities for future investigations.
2025
Istituto di Sistemi e Tecnologie Industriali Intelligenti per il Manifatturiero Avanzato - STIIMA (ex ITIA) Sede Secondaria Bari
ASR - Direzione Generale
Multimodal data
Manufacturing
Multimodal features
Action segmentation
Video understanding
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0952197625003203-main.pdf

accesso aperto

Descrizione: Multi-modal temporal action segmentation for manufacturing scenarios
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 3.44 MB
Formato Adobe PDF
3.44 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/552909
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact