Cet article décrit un système de classification phonétique basé sur un réseau de neurones récurrent appliqué aux informations visuelle et auditive. Le signal de parole est traité par un modèle du système auditif qui fournit des paramètres de type spectraux. Le signal visuel est traité par un dispositif spécialisé, appelé ELITE, qui calcule les paramètres cinématiques des lèvres et de la mâchoire. On fournit les résultats de quelques expériences de reconnaissance phonétique, en modes dépendant et indépendant du locuteur, concernant les plosives en Italien.
A phonetic classification scheme based on a feed forward recurrent back-propagation neural network working on audio and visual information is described. The speech signal is processed by an auditory model producing spectral-like parameters, while the visual signal is processed by a specialised hardware, called ELITE, computing lip and jaw kinematics parameters. Some results will be given for various speaker dependent and independent phonetic recognition experiments regarding the Italian plosive consonants.
Phonetic Recognition by Recurrent Neural Networks Working on Audio and Visual Information
Cosi P;
1996
Abstract
A phonetic classification scheme based on a feed forward recurrent back-propagation neural network working on audio and visual information is described. The speech signal is processed by an auditory model producing spectral-like parameters, while the visual signal is processed by a specialised hardware, called ELITE, computing lip and jaw kinematics parameters. Some results will be given for various speaker dependent and independent phonetic recognition experiments regarding the Italian plosive consonants.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.