CNR Institutional Research Information System

Hand gesture understanding is essential for several ap plications in human-computer interaction, including auto matic clinical assessment of hand dexterity. While deep learning has advanced static gesture recognition, dynamic gesture understanding remains challenging due to com plex spatiotemporal variations. Moreover, existing datasets often lack multimodal and multi-view diversity, precise ground-truth tracking, and an action quality component em bedded within gestures. This paper introduces EHWGes ture, a multimodal video dataset for gesture understanding featuring five clinically relevant gestures. It includes over 1,100 recordings (∼6 hours), captured from 25 healthy sub jects using two high-resolution RGB-Depth cameras and an event camera. A motion capture system provides precise ground-truth hand landmark tracking, and all devices are spatially calibrated and synchronized to ensure cross-modal alignment. Moreover, to embed anaction quality task within gesture understanding, collected recordings are organized in classes of execution speed that mirror clinical evalua tions of hand dexterity. Baseline experiments highlight the dataset’s potential for gesture classification, gesture trigger detection, and action quality assessment. Thus, EHWGes ture can serve as a comprehensive benchmark for advanc ing multimodal clinical gesture understanding.

EHWGesture - A Dataset for Multimodal Understanding of Clinical Gestures

Gianluca Amprimo;Alberto Ancillotto;Alessandro Savino;Fabio Quazzolo;Claudia Ferraris;Gabriella Olmo;Elisabetta Farella;Stefano Di Carlo

2025

Abstract

Hand gesture understanding is essential for several ap plications in human-computer interaction, including auto matic clinical assessment of hand dexterity. While deep learning has advanced static gesture recognition, dynamic gesture understanding remains challenging due to com plex spatiotemporal variations. Moreover, existing datasets often lack multimodal and multi-view diversity, precise ground-truth tracking, and an action quality component em bedded within gestures. This paper introduces EHWGes ture, a multimodal video dataset for gesture understanding featuring five clinically relevant gestures. It includes over 1,100 recordings (∼6 hours), captured from 25 healthy sub jects using two high-resolution RGB-Depth cameras and an event camera. A motion capture system provides precise ground-truth hand landmark tracking, and all devices are spatially calibrated and synchronized to ensure cross-modal alignment. Moreover, to embed anaction quality task within gesture understanding, collected recordings are organized in classes of execution speed that mirror clinical evalua tions of hand dexterity. Baseline experiments highlight the dataset’s potential for gesture classification, gesture trigger detection, and action quality assessment. Thus, EHWGes ture can serve as a comprehensive benchmark for advanc ing multimodal clinical gesture understanding.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di Elettronica e di Ingegneria dell'Informazione e delle Telecomunicazioni - IEIIT
			
	Codice ISBN
	
				979-8-3315-8988-2
979-8-3315-8989-9
			
	Parole chiave
	
				hand gesture recognition; multimodal learning; dynamic gesture analysis; neuromorphic vision

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/583303

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni

ND

ND

ND

social impact