Speech Separation and Comprehension in Complex Acoustic Environments |
||||||||
Evaluation of Speech Separation Systems and Corpus DevelopmentChair: Dan Ellis Participants: Martin Cooke, Alex Acero, Douglas Brungart, Lucas Parra (also representing Te-Won Lee) (NB: Talk titles link to presentation slides.) Outline
References:
GRID: an audio-visual corpus for research in speech perception and automatic speech recognitionMicroscopic models of speech perception predict listeners' responses to individual (usually noisy) speech tokens. Such models promise to lead to greater insights into human speech perception than their macroscopic cousins which are only able to predict overall intelligibility. However, no collection of speech material suitable for joint work in modelling and perception testing exists at present. Corpora collected for speech perception tend to be too small to allow training of speech recognisers, while those used for work in ASR are usually inappropriate for presentation to listeners. As a consequence, models of speech perception are typically based on tiny amounts of training material and non state of the art learning algorithms. The GRID corpus is a first step towards the provision of speech material suitable for both modelling and listening tests. GRID builds on the CRM corpus but corrects for the latter's lack of phonetic balance and small size. At the same time, both audio and visual (facial) material will be collected, making up for the absence of large/affordable AV corpora. Collection of the corpus is scheduled for Q4 2004, with analysis and release in Q1 and Q2 2005. In this talk, I'll describe the rationale for and detailed design of GRID, and outline progress on its collection. |