[AFRL logo]

Speech Separation and Comprehension in Complex Acoustic Environments
Thu Nov 4 - Sun Nov 7, 2004
Montreal, Quebec
sponsored by the Air Force Office of Scientific Research and the National Science Foundation

[EBIRE logo]

Combating the Reverberation Problem

Chair: Barbara Shinn-Cunningham

Presenters: DeLiang Wang, Jay Desloge, Martin Cooke

(NB: Talk titles link to presentation slides.)

Outline

  • Intro / How humans cope in natural settings (Shinn-Cunningham)
  • How speech is corrupted by reverberation (Cooke)
  • Making CASA models robust in reverberant settings (Wang)
  • Making multimicrophone algorithms robust in reverberant settings (Desloge)

Barbara Shinn-Cunningham

How humans cope in natural settings

In natural settings, reverberation changes many aspects of the sound reaching our ears (e.g., smearing out temporal structure, altering the short-term spectral content, distorting spatial cues in the signals, etc.). Despite the sometimes drastic distortions caused by reverberant energy, we can usually extract the "true" information in the signals we hear, including the content of any speech messages, the location of the sound sources, and information about the environment itself. After a brief discussion of how reverberation influences the signals we hear, I will discuss snippets of data that suggest that in order to overcome distortion from reverberant energy. we calibrate to our environment, changing how we process and interpret sound based on the distortion we expect to hear. The ensuing discussion should address how our brains cope with natural settings as well as how the neural approach is similar to and differs from the approaches used in most machine algorithms.

Papers:


Martin Cooke

How speech is corrupted by reverberation

Even in environments containing a single acoustic source, the presence of reverberant energy has a detrimental effect on the speech signal. Reverberation reduces the salience of numerous spectro-temporal features which help to define important phonetic distinctions, and undermines the effectiveness of binaural information. However, things get much worse when several sound sources are present. Not only does the additional non-direct energy constitute an additional masker for each source, but reverberation also blurs many of the cues (eg f0 dynamics) which are held to aid in source segregation. In this talk, I will use examples of speech in moderate and severe reverberation to demonstrate both pictorially and quantitatively the corruption of speech features and cues to perceptual organisation in single and multisource environments.

This is joint with Barbara Shinn-Cunningham and Madhusudana Shashanka.


DeLiang Wang

Effects of reverberation on pitch, onset/offset, and binaural cues

To combat the reverberation problem computationally, one must understand effects of reverberation on commonly used auditory features. I will present preliminary observations on reverberation-caused distortions on estimating pitch, onset and offset, and interaural time/intensity differences for natural speech. I will also discuss their implications for segregation of reverberant speech.

Papers:


Jay Desloge

Multi-microphone Source Separation in Reverberant Environments

Multi-microphone source separation systems form the final system outputs by using linear filters to process and combine microphone signals. Although the specific filters can be generated in a variety of ways, most techniques (including adaptive spatial filtering and ICA) separate a particular source from the environment using filters designed to preserve the desired source with unit gain while attenuating the remaining sources.

Recent research involving directional hearing-aid performance indicates that spatial filtering techniques using these structures provide quite small benefits in realistically reverberant environments. Substantial source separation (even given exact knowledge of source-to-microphone propagation for all sources) requires filters that are often several times longer than the reverberation time of the environment. Such long filters are not only computationally demanding but they are also impractical in time-varying environments.

In this session, I will discuss performance-limiting factors of these multi-microphone architectures in reverberant environments and will suggest means of improving their operation.

Papers:

  • Desloge, J.G., Zimmer, M.J., and Zurek, P.M., "Determining the performance benefit of adaptive, multi-microphone, interference-canceling systems in everyday listening environments," Acoustical Society of America Meeting, New York, NY, May 2004.