Nous présentons un système de classification de phonèmes indépendant du locuteur et appliqué aux voyelles. L'architecture du classificateur de voyelles est basée surun modèle d'oreille suivi d'un ensemble de réseaux neuronaux à plusieurs couches (MLNN). Les MLNNs apprennent à reconnaître les traits articulatoires, par exemple le lieu et le mode d'articulation en relation avec la position de la langue. Des expériences ont été effectuées sur 10 voyelles anglaises et montrent un taux de reconnaissance supérieur à 95% sur de nouveaux locuteurs. Lorsque les traits sont utilisés pour la reconnaissance, des résultats comparables sont obtenus pour des voyelles et des dihthongues qui n'ont pas été utilisées lors de l'apprentissage et prononcées par de nouveaux locuteurs. Ceci suggère que, pour des données calculées par un modèle d'oreille, les MLNNs présentent un bon pouvoir de généralisation pour de nouveaux locuteurs et de nouveaux sons.
The vowel sub-component of a speaker-independent phoneme classification system will be described. The architecture of the vowel classifier is based on an ear model followed by a set of Multi-Layered Neural Networks (MLNN). MLNNs are trained to learn how to recognize articulatory features like the place of articulation and the manner of articulation related to tongue position. Experiments are performed on 10 English vowels showing a recognition rate higher than 95% on new speakers. When features are used for recognition, comparable results are obtained for vowels and diphthongs not used for training and pronounced by new speakers. This suggests that MLNNs suitably fed by the data computed by an ear model have good generalization capabilities over new speakers and new sounds.
Phonetically-Based Multi-Layered Neural Networks for Vowel Classification
Cosi P;
1990
Abstract
The vowel sub-component of a speaker-independent phoneme classification system will be described. The architecture of the vowel classifier is based on an ear model followed by a set of Multi-Layered Neural Networks (MLNN). MLNNs are trained to learn how to recognize articulatory features like the place of articulation and the manner of articulation related to tongue position. Experiments are performed on 10 English vowels showing a recognition rate higher than 95% on new speakers. When features are used for recognition, comparable results are obtained for vowels and diphthongs not used for training and pronounced by new speakers. This suggests that MLNNs suitably fed by the data computed by an ear model have good generalization capabilities over new speakers and new sounds.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


