Human action recognition is an active topic of research in computer vision and machine learning. Its application in the industrial domain is even more challenging since workers can handle multiple objects and follow different assembly sequences, and only a few datasets are target-oriented. However, the availability of low-cost cameras capable of extracting high-level information about human posture and movement opens up new possibilities. This work compares four state-of-the-art graph neural networks working with skeletal data to recognize the actions in the HA4M dataset, where subjects perform an assembly task. Videos are divided into clips of consecutive frames that form the input skeletal graphs of the networks. Then, an algorithm for action segmentation is proposed to assess each action’s exact starting and ending instants. Results show that the best performance is achieved by a two-stream Adaptive Graph Convolutional Network trained with input clips 77 frames long.
Continuous Action Recognition in Manufacturing Contexts by Deep Graph Convolutional Networks
Maselli M. V.
;Marani R.;Cicirelli G.;
2024
Abstract
Human action recognition is an active topic of research in computer vision and machine learning. Its application in the industrial domain is even more challenging since workers can handle multiple objects and follow different assembly sequences, and only a few datasets are target-oriented. However, the availability of low-cost cameras capable of extracting high-level information about human posture and movement opens up new possibilities. This work compares four state-of-the-art graph neural networks working with skeletal data to recognize the actions in the HA4M dataset, where subjects perform an assembly task. Videos are divided into clips of consecutive frames that form the input skeletal graphs of the networks. Then, an algorithm for action segmentation is proposed to assess each action’s exact starting and ending instants. Results show that the best performance is achieved by a two-stream Adaptive Graph Convolutional Network trained with input clips 77 frames long.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.