CNR Institutional Research Information System

Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a processing framework for voice transformations finalized to the analysis and synthesis of emotive speech. We use a GMM-based model to compute the differences between an MBROLA voice and an anger voice, and we address the modification of the MBROLA voice spectra by using a set of spectral conversion functions trained on the data. We propose to organize the speech data for the training in such way that the target emotive speech data and the diphone database used for the text-to-speech synthesis, both come from the same speaker. A copy-synthesis procedure is used to produce synthesis speech utterances where pitch patterns, phoneme duration, and principal speaker characteristics are the same as in the target emotive utterances. This results in a better isolation of the voice quality differences due to the emotive arousal. Three different models to represent voice quality differences are applied and compared. The models are all based on a GMM representation of the acoustic space. The performance of these models is discussed and the experimental results and assessment are presented.

Voice GMM modelling of voice quality for FESTIVAL/MBROLA emotive TTS synthesis

Mauro Nicolao;Carlo Drioli;Piero Cosi

2006

Abstract

Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a processing framework for voice transformations finalized to the analysis and synthesis of emotive speech. We use a GMM-based model to compute the differences between an MBROLA voice and an anger voice, and we address the modification of the MBROLA voice spectra by using a set of spectral conversion functions trained on the data. We propose to organize the speech data for the training in such way that the target emotive speech data and the diphone database used for the text-to-speech synthesis, both come from the same speaker. A copy-synthesis procedure is used to produce synthesis speech utterances where pitch patterns, phoneme duration, and principal speaker characteristics are the same as in the target emotive utterances. This results in a better isolation of the voice quality differences due to the emotive arousal. Three different models to represent voice quality differences are applied and compared. The models are all based on a GMM representation of the acoustic space. The performance of these models is discussed and the experimental results and assessment are presented.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2006
			
	Strutture organizzative
	
				Istituto di Scienze e Tecnologie della Cognizione - ISTC
			
	Lingua/e
	
				Inglese
			
	Titolo del Volume
	
				9th International Conference on Spoken Language Processing (Interspeech 2006 -- ICSLP)
			
	Serie
	
				INTERSPEECH
			
	Titolo del convegno
	
				Interspeech 2006 -- ICSLP - 9th International Conference on Spoken Language Processing
			
	Da pagina
	
				1794
			
	A pagina
	
				1797
			
	Numero di pagine
	
				4
			
	Codice ISBN
	
				978-1-60423-449-7
			
	URL
	
				http://www.isca-speech.org/archive/interspeech_2006/i06_1597.html
			
	Nome Editore
	
				ISCA-INST SPEECH COMMUNICATION ASSOCIATION, C/O EMMANUELLE FOXONET
			
	Città Editore
	
				LIEU DIT LOUS TOURILS, BAIXAS, F-66390
			
	Nazione Editore
	
				FRANCIA
			
	Periodo del Convegno
	
				17-21, Settembre 2006
			
	Luogo del Convegno
	
				Pittsburgh, PA, USA
			
	Parole chiave
	
				Emotive Speech Synthesis
Voice C
GMM
Italian Festival
MBROLA
			
	Altre informazioni
	
				Articolo in Atti di Convegno ISI
			
	Codice Scopus
	
				2-s2.0-44949138929
			
	Codice Web of Science
	
				WOS:000269965901186
			
	Numero autori
	
				1
			
	Fulltext
	
				none
			
	Tutti gli autori
	
						Mauro Nicolao; Carlo Drioli; Piero Cosi
					
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/11611

Citazioni

ND

5

1

social impact