Factorial models and refiltering for speech separation and denoising
Sam Roweis
(University of Toronto)

In this talk, I will explore the combination of several ideas, some old and some new, from from machine learning and speech processing. First, I'll review the astonishing max approximation to log spectrograms of mixtures, show why this motivates a ``refiltering'' approach to separation and denoising, and then describe how the process of inference in factorial probabilistic models performs a computation very useful for deriving the masking signals needed in refiltering. A particularly simple model, factorial-max vector quantization (MAXVQ), along with a branch-and-bound technique for efficient exact inference can be applied to both denoising and monaural separation. This approach represents a return to the ideas of Ephraim, Varga and Moore but applied to auditory scene analysis rather than to speech recognition.

Relevant material:

Sam T. Roweis (2003). Factorial Models and Refiltering for Speech Separation and Denoising. Proceedings of Eurospeech03 (Geneva, Switzerland), pp. 1009—1012. http://www.cs.toronto.edu/~roweis/papers/eurospeech03.pdf

Sam T. Roweis (2000). One Microphone Source Separation. Proceedings of Neural Information Processing Systems 13 (NIPS'00). pp. 793-799. http://www.cs.toronto.edu/~roweis/papers/onemic.pdf