The growing interest in emotional speech synthesis urges effective
emotion conversion techniques to be explored. This paper
estimates the relevance of three speech components (spectral
envelope, residual excitation and prosody) for synthesizing
identifiable emotional speech, in order to be able to customize
the voice conversion techniques to the specific characteristics
of each emotion. The analysis has been based on listening a set
of synthetic mixed-emotional utterances that draw their speech
components from emotional and neutral recordings. Results
prove the importance of transforming residual excitation for the
identification of emotions that are not fully conveyed through
prosodic means (such as cold anger or sadness in our Spanish
corpus).
|