Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a sinusoidal modeling processing framework for voice transformations finalized to the analysis and synthesis of emotive speech. A set of acoustic cues is selected to compare the voice quality characteristics of the speech signals on a voice corpus in which different emotions are reproduced. The sinusoidal signal processing tool is used to convert a neutral utterance into emotive utterances. Two different procedures are applied and compared: in the first one, only the alignment of phoneme duration and of pitch contour is performed; the second procedure refines the transformations by using a spectral conversion function. This refinement improves the reproduction of the different voice qualities of the target emotive utterances. The acoustic cues extracted from the transformed utterances are compared to the emotive original utterances, and the properties and quality of the transformation method are discussed.

Emotions and Voice Quality: Experiments with Sinusoidal Modeling

Cosi P;Tesser F
2003

Abstract

Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a sinusoidal modeling processing framework for voice transformations finalized to the analysis and synthesis of emotive speech. A set of acoustic cues is selected to compare the voice quality characteristics of the speech signals on a voice corpus in which different emotions are reproduced. The sinusoidal signal processing tool is used to convert a neutral utterance into emotive utterances. Two different procedures are applied and compared: in the first one, only the alignment of phoneme duration and of pitch contour is performed; the second procedure refines the transformations by using a spectral conversion function. This refinement improves the reproduction of the different voice qualities of the target emotive utterances. The acoustic cues extracted from the transformed utterances are compared to the emotive original utterances, and the properties and quality of the transformation method are discussed.
2003
Istituto di Scienze e Tecnologie della Cognizione - ISTC
Istituto di Scienze e Tecnologie della Cognizione - ISTC
Voice Quality
Emotions
Sinusoidal Modelling
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/430971
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact