The research on automatic speech recognition aims to give the machine capabilities similar to humans to communicate in natural spoken languages, and such research is of great interest from both the application and the research point of view. This chapter discusses the fundamentals of speech production and speech knowledge, numerous techniques used in speech recognition systems, some successful speech recognition systems, and some recent advances in speech recognition research, such as the application of artificial neural network models and a special case of Hidden Markov models. The problem of speech recognition is approached in two ways: using models based on speech production, and using models based on speech perception. The chapter illustrates a combination of an ear model and multi-layer networks that makes possible an effective generalization among speakers in coding vowels. In addition, it also suggests that the use of speech knowledge organized as morphological properties is robust enough to handle inter- and intra-speaker variations. By learning the ways to allocate the degrees of evidence to articulatory features, it is possible to estimate normalized values for the place and manner of articulation, which appear to be highly consistent with qualitative expectations based on speech knowledge. The effective learning and good generalizations can be obtained using a limited number of speakers, in analogy with what humans do. Speech coders that create degrees of evidence of phonetic features can be used for fast lexical access, to recognize phonemes in new languages with limited training, to constrain the search for the interpretation of a sentence.
Perceptual Models for Automatic Speech Recognition Systems
Cosi P
1990
Abstract
The research on automatic speech recognition aims to give the machine capabilities similar to humans to communicate in natural spoken languages, and such research is of great interest from both the application and the research point of view. This chapter discusses the fundamentals of speech production and speech knowledge, numerous techniques used in speech recognition systems, some successful speech recognition systems, and some recent advances in speech recognition research, such as the application of artificial neural network models and a special case of Hidden Markov models. The problem of speech recognition is approached in two ways: using models based on speech production, and using models based on speech perception. The chapter illustrates a combination of an ear model and multi-layer networks that makes possible an effective generalization among speakers in coding vowels. In addition, it also suggests that the use of speech knowledge organized as morphological properties is robust enough to handle inter- and intra-speaker variations. By learning the ways to allocate the degrees of evidence to articulatory features, it is possible to estimate normalized values for the place and manner of articulation, which appear to be highly consistent with qualitative expectations based on speech knowledge. The effective learning and good generalizations can be obtained using a limited number of speakers, in analogy with what humans do. Speech coders that create degrees of evidence of phonetic features can be used for fast lexical access, to recognize phonemes in new languages with limited training, to constrain the search for the interpretation of a sentence.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


