Emulating Subjective Criteria in Corpus Validation

View Sample PDF

Author(s): Ignasi Iriondo (Universitat Ramon Llull, Spain), Santiago Planet (Universitat Ramon Llull, Spain), Francesc Alías (Universitat Ramon Llull, Spain), Joan-Claudi Socoró (Universitat Ramon Llull, Spain)and Elisa Martínez (Universitat Ramon Llull, Spain)
Copyright: 2009
Pages: 6
Source title: Encyclopedia of Artificial Intelligence
Source Author(s)/Editor(s): Juan Ramón Rabuñal Dopico (University of A Coruña, Spain), Julian Dorado (University of A Coruña, Spain)and Alejandro Pazos (University of A Coruña, Spain)
DOI: 10.4018/978-1-59904-849-9.ch083

Keywords: Artificial Intelligence / Computer Science & IT / Engineering Science Reference

Purchase

View Emulating Subjective Criteria in Corpus Validation on the publisher's website for pricing and purchasing information.

Abstract

The use of speech in human-machine interaction is increasing as the computer interfaces are becoming more complex but also more useable. These interfaces make use of the information obtained from the user through the analysis of different modalities and show a specific answer by means of different media. The origin of the multimodal systems can be found in its precursor, the “Put-That-There” system (Bolt, 1980), an application operated by speech and gesture recognition. The use of speech as one of these modalities to get orders from users and to provide some oral information makes the human-machine communication more natural. There is a growing number of applications that use speech-to-text conversion and animated characters with speech synthesis. One way to improve the naturalness of these interfaces is the incorporation of the recognition of user’s emotional states (Campbell, 2000). This point generally requires the creation of speech databases showing authentic emotional content allowing robust analysis. Cowie, Douglas-Cowie & Cox (2005) present some databases showing an increase in multimodal databases, and Ververidis & Kotropoulos (2006) describe 64 databases and their application. When creating this kind of databases the main arising problem is the naturalness of the locutions, which directly depends on the method used in the recordings, assuming that they must be controlled without interfering the authenticity of the locutions. Campbell (2000) and Schröder (2004) propose four different sources for obtaining emotional speech, ordered from less control but more authenticity to more control but less authenticity: i) natural occurrences, ii) provocation of authentic emotions in laboratory conditions, iii) stimulated emotions by means of prepared texts, and iv) acted speech reading the same texts with different emotional states, usually performed by actors. On the one hand, corpora designed to synthesize emotional speech are based on studies centred on the listener, following the distinction made by Schröder (2004), because they model the speech parameters in order to transmit a specific emotion. On the other hand, emotion recognition implies studies centred on the speaker, because they are related to the speaker emotional state and the parameters of the speech. The validation of a corpus used for synthesis involves both kinds of studies: the former since it will be used for synthesis and the latter since recognition is needed to evaluate its content. The best validation system is the selection of the valid utterances1 of the corpus by human listeners. However, the big size of a corpus makes this process unaffordable.

The IRMA Community

Research IRM

Emulating Subjective Criteria in Corpus Validation

Purchase

Abstract

Related Content

IRMA Sponsors