Speechfind: Advances in Rich Content Based Spoken Document Retrieval

View Sample PDF

Author(s): Wooil Kim (Center for Robust Speech Systems (CRSS) and Erik Jonsson School of Engineering and Computer Science at the University of Texas at Dallas, USA)and John H.L. Hansen (University of Texas at Dallas, USA)
Copyright: 2009
Pages: 15
Source title: Handbook of Research on Digital Libraries: Design, Development, and Impact
Source Author(s)/Editor(s): Yin-Leng Theng (Nanyang Technological University, Singapore), Schubert Foo (Nanyang Technological University, Singapore), Dion Goh (Nanyang Technological University, Singapore)and Jin-Cheon Na (Nanyang Technological University, Singapore)
DOI: 10.4018/978-1-59904-879-6.ch017

Keywords: Digital Libraries / Information Science Reference / Library & Information Science / Library Science

Purchase

View Speechfind: Advances in Rich Content Based Spoken Document Retrieval on the publisher's website for pricing and purchasing information.

Abstract

This chapter addresses a number of advances in formulating spoken document retrieval for the National Gallery of the Spoken Word (NGSW) and the U.S.-based Collaborative Digitization Program (CDP). After presenting an overview of the audio stream content of the NGSW and CDP audio corpus, an overall system diagram is presented with a discussion of critical tasks associated with effective audio information retrieval that include advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and information retrieval using natural language processing for text query requests that include document and query expansion. Our experimental online system entitled “SpeechFind” is presented which allows for audio retrieval from the NGSW and CDP corpus. Finally, a number of research challenges as well as new directions are discussed in order to address the overall task of robust phrase searching in unrestricted audio corpora.

The IRMA Community

Research IRM

Speechfind: Advances in Rich Content Based Spoken Document Retrieval

Purchase

Abstract

Related Content

IRMA Sponsors