Geintra

Departamento de electronica

Universidad de Alcala

Research lines

Access information on the Geintra research activity structure. More information

Work with us

Access to our current offer of grants and contracts. More information

Contact

You can contact us using different means. More information

Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

Title	Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech
Publication Type	Journal Article
Año de publicación	2010
Autores	Barra-Chicote, R, Yamagishi, J, King, S, Montero, JM, Macias-Guarasa, J
Idioma de publicación	English
Journal	Speech Communication
Volumen	52
Número	5
Páginas	394-404
Fecha de publicación	05/2010
Editorial	Elsevier
Rank in category	38/94
JCR Category	COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Palabras clave	Emotional speech synthesis, HMM-based synthesis, unit selection synthesis
JCR Impact Factor	1.229
ISSN	0167-6393
DOI	10.1016/j.specom.2009.12.007
Abstract	We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recorded – happiness, sadness, anger, surprise, fear, disgust. For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion. Our analysis shows that, although the HMM method produces significantly better neutral speech, the two methods produce emotional speech of similar quality, except for emotions having context-dependent prosodic patterns. Whilst synthetic speech produced using the unit selection method has better emotional strength scores than the HMM-based method, the HMM-based method has the ability to manipulate the emotional strength. For emotions that are characterized by both spectral and prosodic components, synthetic speech using unit selection methods was more accurately identified by listeners. For emotions mainly characterized by prosodic components, HMM-based synthetic speech was more accurately identified. This finding differs from previous results regarding listener judgements of speaker similarity for neutral speech. We conclude that unit selection methods require improvements to prosodic modeling and that HMM-based methods require improvements to spectral modeling for emotional speech. Certain emotions cannot be reproduced well by either method.

Attachment	Size
2010-FinalPaperBarraSpecom.pdf	452.68 KB

Login to post comments
Google Scholar
BibTex
RIS
XML

Geintra © 2008-2024

Nota Legal - Mapa del sitio