A general data-driven procedure for creating new prosodic modules for the Italian FESTIVAL Text-To-Speech (TTS) [1] synthesizer is described. These modules are based on the "Classification and Regression Trees" (CART) theory. The prosodic factors taken into consideration are: duration, pitch and loudness. Loudness control has been implemented as an extension to the MBROLA diphone concatenative synthesizer. The prosodic models were trained using two speech corpora with different speaking style, and the effectiveness of the CART-based prosody was assessed with a set of evaluation tests.

Prosodic Data-Driven Modelling of Narrative Style in FESTIVAL TTS

Tesser F;Cosi P;Tisato G
2004

Abstract

A general data-driven procedure for creating new prosodic modules for the Italian FESTIVAL Text-To-Speech (TTS) [1] synthesizer is described. These modules are based on the "Classification and Regression Trees" (CART) theory. The prosodic factors taken into consideration are: duration, pitch and loudness. Loudness control has been implemented as an extension to the MBROLA diphone concatenative synthesizer. The prosodic models were trained using two speech corpora with different speaking style, and the effectiveness of the CART-based prosody was assessed with a set of evaluation tests.
2004
Istituto di Scienze e Tecnologie della Cognizione - ISTC
Istituto di Scienze e Tecnologie della Cognizione - ISTC
Data-Driven
Prosody Model
Narrative Style
TTS
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/60745
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact