Hand gesture understanding is essential for several ap plications in human-computer interaction, including auto matic clinical assessment of hand dexterity. While deep learning has advanced static gesture recognition, dynamic gesture understanding remains challenging due to com plex spatiotemporal variations. Moreover, existing datasets often lack multimodal and multi-view diversity, precise ground-truth tracking, and an action quality component em bedded within gestures. This paper introduces EHWGes ture, a multimodal video dataset for gesture understanding featuring five clinically relevant gestures. It includes over 1,100 recordings (∼6 hours), captured from 25 healthy sub jects using two high-resolution RGB-Depth cameras and an event camera. A motion capture system provides precise ground-truth hand landmark tracking, and all devices are spatially calibrated and synchronized to ensure cross-modal alignment. Moreover, to embed anaction quality task within gesture understanding, collected recordings are organized in classes of execution speed that mirror clinical evalua tions of hand dexterity. Baseline experiments highlight the dataset’s potential for gesture classification, gesture trigger detection, and action quality assessment. Thus, EHWGes ture can serve as a comprehensive benchmark for advanc ing multimodal clinical gesture understanding.
EHWGesture - A Dataset for Multimodal Understanding of Clinical Gestures
Gianluca Amprimo;Claudia Ferraris;
2025
Abstract
Hand gesture understanding is essential for several ap plications in human-computer interaction, including auto matic clinical assessment of hand dexterity. While deep learning has advanced static gesture recognition, dynamic gesture understanding remains challenging due to com plex spatiotemporal variations. Moreover, existing datasets often lack multimodal and multi-view diversity, precise ground-truth tracking, and an action quality component em bedded within gestures. This paper introduces EHWGes ture, a multimodal video dataset for gesture understanding featuring five clinically relevant gestures. It includes over 1,100 recordings (∼6 hours), captured from 25 healthy sub jects using two high-resolution RGB-Depth cameras and an event camera. A motion capture system provides precise ground-truth hand landmark tracking, and all devices are spatially calibrated and synchronized to ensure cross-modal alignment. Moreover, to embed anaction quality task within gesture understanding, collected recordings are organized in classes of execution speed that mirror clinical evalua tions of hand dexterity. Baseline experiments highlight the dataset’s potential for gesture classification, gesture trigger detection, and action quality assessment. Thus, EHWGes ture can serve as a comprehensive benchmark for advanc ing multimodal clinical gesture understanding.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


