CNR Institutional Research Information System

Gestures should be considered an integral part of the language. Non-verbal communication integrates, enriches, and sometimes wholly replaces language. Therefore, in the anthropomorphization process of the human-robot interaction or human-machine interaction, one cannot ignore the integration of the auditory and visual canals. This article presents a model for gesture recognition and its synchronization with speech. We imagine a scenario where the human being interacts with the robotic agent, which knows the organization of the space around it and the disposition of the objects, in natural language, indicating the things he intends to refer to. The model recognizes the stroke-hold typical of the deictic gesture and identifies the word or words corresponding to the gesture in the user's sentence. The purpose of the model is to replace any adjective and demonstrative pronoun or indical expressions with spatial information helpful in recognizing the intent of the sentence. We have built a development and simulation framework based on a web interface to test the system. The first results were very encouraging. The model has been shown to work well in real-time, with reasonable success rates on the assigned task.

A hybrid model for gesture recognition and speech synchronization

Maniscalco;Umberto;Messina;Antonio;Storniolo;Pietro

2023

Abstract

Gestures should be considered an integral part of the language. Non-verbal communication integrates, enriches, and sometimes wholly replaces language. Therefore, in the anthropomorphization process of the human-robot interaction or human-machine interaction, one cannot ignore the integration of the auditory and visual canals. This article presents a model for gesture recognition and its synchronization with speech. We imagine a scenario where the human being interacts with the robotic agent, which knows the organization of the space around it and the disposition of the objects, in natural language, indicating the things he intends to refer to. The model recognizes the stroke-hold typical of the deictic gesture and identifies the word or words corresponding to the gesture in the user's sentence. The purpose of the model is to replace any adjective and demonstrative pronoun or indical expressions with spatial information helpful in recognizing the intent of the sentence. We have built a development and simulation framework based on a web interface to test the system. The first results were very encouraging. The model has been shown to work well in real-time, with reasonable success rates on the assigned task.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Visualization
Natural languages
Speech recognition
Gesture recognition
Real-time systems
Synchronization
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/459324

Citazioni

ND

ND

ND

social impact