With the rapid evolution of advanced industrial systems exploiting deep learning techniques, the availability of multimodal and heterogeneous datasets of operators working in industrial scenarios is essential. Such datasets allow in-depth studies for accurate segmentation and recognition of the actions of operators working alongside collaborative robots. Using multimodal information guarantees the capture of relevant features to analyze human movements properly. This paper presents our recent research activity on the development of two datasets representing human operators performing assembly tasks in industrial contexts. The dataset for Human Action Multi-Modal Monitoring in Manufacturing (HA4M) is a collection of multimodal data recorded using a Microsoft Azure Kinect camera observing 41 subjects while performing 12 actions to assemble an Epicyclic Gear Train (EGT). The dataset for Human-Cobot Collaboration for Action Recognition in Manufacturing Assembly (HARMA) focuses on the interaction between 27 subjects and a collaborative robot while assembling the EGT in 7 actions. In this case, the acquisition setup consisted of two Microsoft Azure Kinect cameras. Both datasets were collected in controlled laboratories. To prove the validity of the HA4M and HARMA datasets, state-of-the-art temporal action segmentation models, i.e. MS-TCN++ and ASFormer, were trained using both skeletal and video features. The results successfully prove the effectiveness of the presented datasets in segmenting human actions in industrial contexts.
Industrial Datasets for Multi-Modal Monitoring of an Assembly Task for Human Action Recognition and Segmentation
Romeo L.
;Bono A.;Cicirelli G.;D'Orazio T.
2024
Abstract
With the rapid evolution of advanced industrial systems exploiting deep learning techniques, the availability of multimodal and heterogeneous datasets of operators working in industrial scenarios is essential. Such datasets allow in-depth studies for accurate segmentation and recognition of the actions of operators working alongside collaborative robots. Using multimodal information guarantees the capture of relevant features to analyze human movements properly. This paper presents our recent research activity on the development of two datasets representing human operators performing assembly tasks in industrial contexts. The dataset for Human Action Multi-Modal Monitoring in Manufacturing (HA4M) is a collection of multimodal data recorded using a Microsoft Azure Kinect camera observing 41 subjects while performing 12 actions to assemble an Epicyclic Gear Train (EGT). The dataset for Human-Cobot Collaboration for Action Recognition in Manufacturing Assembly (HARMA) focuses on the interaction between 27 subjects and a collaborative robot while assembling the EGT in 7 actions. In this case, the acquisition setup consisted of two Microsoft Azure Kinect cameras. Both datasets were collected in controlled laboratories. To prove the validity of the HA4M and HARMA datasets, state-of-the-art temporal action segmentation models, i.e. MS-TCN++ and ASFormer, were trained using both skeletal and video features. The results successfully prove the effectiveness of the presented datasets in segmenting human actions in industrial contexts.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.