CNR Institutional Research Information System

In this paper, we describe the application of two vocoder techniques for an experiment of spectral envelope transformation. We processed speech data in a neutral standard reading style in order to reproduce the spectral shapes of two emotional speaking styles: happy and sad. This was achieved by means of conversion functions which operate in the frequency domain and are trained with aligned source-target pairs of spectral features. The first vocoder is based on the source-filter model of speech production and exploits the Mel Log Spectral Approximation filter, while the second is the Phase vocoder. Objective distance measures were calculated in order to evaluate the effectiveness of the conversion framework in predicting the target spectral envelopes. Subjective listening tests also provided interesting elements for the evaluation.

Two Vocoder Techniques for Neutral to Emotional Timbre Conversion

Tesser F;Zovato E;Nicolao M;Cosi P

2010

Abstract

In this paper, we describe the application of two vocoder techniques for an experiment of spectral envelope transformation. We processed speech data in a neutral standard reading style in order to reproduce the spectral shapes of two emotional speaking styles: happy and sad. This was achieved by means of conversion functions which operate in the frequency domain and are trained with aligned source-target pairs of spectral features. The first vocoder is based on the source-filter model of speech production and exploits the Mel Log Spectral Approximation filter, while the second is the Phase vocoder. Objective distance measures were calculated in order to evaluate the effectiveness of the conversion framework in predicting the target spectral envelopes. Subjective listening tests also provided interesting elements for the evaluation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2010
			
	Strutture organizzative
	
				Istituto di Scienze e Tecnologie della Cognizione - ISTC
Istituto di Scienze e Tecnologie della Cognizione - ISTC
			
	Lingua/e
	
				Inglese
			
	Supervisori e coordinatori esterni
	
				Yoshinori Sagisaka, And Keiichi Tokuda (Ed.)
			
	Titolo del Volume
	
				Proceedings of 7th Speech Synthesis Workshop (SSW)
			
	Titolo del convegno
	
				SSW7 (7th ISCA Workshop on Speech Synthesis)
			
	Da pagina
	
				130
			
	A pagina
	
				135
			
	Numero di pagine
	
				384
			
	URL
	
				http://www.researchgate.net/publication/228722990_Two_Vocoder_Techniques_for_Neutral_to_Emotional_Timbre_Conversion/file/79e4150a363d77778a.pdf
			
	Referee
	
				Sì, ma tipo non specificato
			
	Periodo del Convegno
	
				22-24 September 2010
			
	Luogo del Convegno
	
				ATR, Kyoto, Japan
			
	Parole chiave
	
				Emotional Speech
Spectral Transformation
Phase Vocoder
MLSA filter
GMM
			
	Altre informazioni
	
				Proceedings of SSW7 (7th ISCA Workshop on Speech Synthesis)
http://isw3.naist.jp/~tomoki/ssw7/www/doc/ssw7_proceedings_rev.pdf
			
	Numero autori
	
				4
			
	Fulltext
	
				none
			
	Tutti gli autori
	
						Tesser, F; Zovato, E; Nicolao, M; Cosi, P
					
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/130358

Citazioni

ND

ND

ND

social impact