Auditory models for speech processing in noisy and reverberant conditions
Guy Brown
(Sheffield University)

We describe two auditory-motivated approaches to robust speech recognition in real (i.e., noisy and reverberant) acoustic environments. Both approaches estimate a binary time-frequency mask for a missing data speech recogniser. The first approach uses spatial cues (interaural time and intensity differences) to segregate the voice of a target speaker from an interfering source at a different location. The second uses modulation filtering to identify time-frequency regions of speech which are least likely to be corrupted by reverberation. The performance of these systems has been evaluated against existing techniques, with encouraging results. The limitations of our approach, and its potential for further development, will be discussed.

Relevant material:

K. J. Palomaki, G. J. Brown and J. Barker (2002) Missing data speech recognition in reverberant conditions. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP-2002), Orlando, 13th-17th May, pp. 65-68.
K. J. Palomaki, G. J. Brown and D. L. Wang (2003) A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Accepted for publication in Speech Communication.