[AFRL logo]

Speech Separation and Comprehension in Complex Acoustic Environments
Thu Nov 4 - Sun Nov 7, 2004
Montreal, Quebec
sponsored by the Air Force Office of Scientific Research and the National Science Foundation

[EBIRE logo]

Variations in Design and Performance of Sensing Arrays

Chairs: Steve Colburn, Te-Won Lee

Presenters: Machine session: DeLiang Wang, Jay Desloge, Te-Won Lee, Jim Flanagan

(NB: Talk titles link to presentation slides.)

Presenters: Perceptual session:


DeLiang Wang

Monaural and Binaural Speech Separation

In this presentation, I will illustrate how to perform speech separation using perceptually-based monaural and binaural analysis. For monaural separation, I'll describe algorithms based on auditory scene segmentation, pitch tracking, onset/offset analysis, and amplitude modulation analysis. For binaural separation, I'll present a supervised learning approach to estimate ideal binary time-frequency masks in the joint feature space of ITD (interaural time difference) and IID (interaul intensity difference). I will also discuss relative strengths and weaknesses of monaural versus binaural processing as well as microphone array techniques.


Joseph Desloge

Directional multimicrophone arrays: a spatial-filtering approach to source separation

In this section, I will discuss the use of M-element microphone arrays to create adaptive spatial filters that can be used to extract specific sources from within complex acoustic environments. I will explore the realistic attainable performance that can be achieved with these systems both in terms of source localization and source separation. I will also compare spatial filtering to other multi-sensor techniques (most notably independent component analysis ICA) in order to provide some understanding of both the strengths and weaknesses of spatial filtering when applied to this task.


Te-Won Lee

ICA-based Techniques for Single Channel and Multichannel Speech Separation

I will briefly summarize approaches to speech separation based on ICA techniques. This includes methods for multichannel blind deconvolution as well as recently proposed methods for single channel blind source separation. I will illustrate the relevancy of the machine learning framework for learning a representation of speech signals and other sounds. The use of a probabilistic graphical model allows a principled and systematic approach to the speech separation problem.


Jim Flanagan

Spatial Selectivity for Speech Separation

As capabilities advance for natural communication with complex systems, hands-free capture of sound grows in interest. Multimodal interfaces, mobile communication, and large-group conferencing are venues where hands/eyes busy tasks are conducted, and where hand-held or body-worn microphones are inconvenient. Hands-free sound capture ideally requires accurate source location (preferably in three dimensions) and good-quality transduction of the located source (again with three-dimensional selectivity). This report indicates techniques, challenges, and research status for employing spatial selectivity for sound separation.