Gestures should be considered an integral part of the language. Non-verbal communication integrates, enriches, and sometimes wholly replaces language. Therefore, in the anthropomorphization process of the human-robot interaction or human-machine interaction, one cannot ignore the integration of the auditory and visual canals. This article presents a model for gesture recognition and its synchronization with speech. We imagine a scenario where the human being interacts with the robotic agent, which knows the organization of the space around it and the disposition of the objects, in natural language, indicating the things he intends to refer to. The model recognizes the stroke-hold typical of the deictic gesture and identifies the word or words corresponding to the gesture in the user's sentence. The purpose of the model is to replace any adjective and demonstrative pronoun or indical expressions with spatial information helpful in recognizing the intent of the sentence. We have built a development and simulation framework based on a web interface to test the system. The first results were very encouraging. The model has been shown to work well in real-time, with reasonable success rates on the assigned task.

A hybrid model for gesture recognition and speech synchronization

Maniscalco;Umberto;Messina;Antonio;Storniolo;Pietro
2023

Abstract

Gestures should be considered an integral part of the language. Non-verbal communication integrates, enriches, and sometimes wholly replaces language. Therefore, in the anthropomorphization process of the human-robot interaction or human-machine interaction, one cannot ignore the integration of the auditory and visual canals. This article presents a model for gesture recognition and its synchronization with speech. We imagine a scenario where the human being interacts with the robotic agent, which knows the organization of the space around it and the disposition of the objects, in natural language, indicating the things he intends to refer to. The model recognizes the stroke-hold typical of the deictic gesture and identifies the word or words corresponding to the gesture in the user's sentence. The purpose of the model is to replace any adjective and demonstrative pronoun or indical expressions with spatial information helpful in recognizing the intent of the sentence. We have built a development and simulation framework based on a web interface to test the system. The first results were very encouraging. The model has been shown to work well in real-time, with reasonable success rates on the assigned task.
2023
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Visualization
Natural languages
Speech recognition
Gesture recognition
Real-time systems
Synchronization
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/459324
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact