Speech Signal Understanding Using Graphical Models
Te-Won Lee
(University of California, San Diego)

We present machine learning algorithms using graphical models for speech signal representation. Learning efficient codes for speech signals in a linear generative model allows us to analyze important speech features and their characteristics to model different sounds, individual speaker characteristics or classes of speakers. This principle can be used to derive a method for solving the difficult problem of separating multiple sources given only a single channel microphone recording. Multi-channel observations can relax some of the constraints in blind source separation. However, this problem now includes reverberations, sensor noise and other real environment challenges. We demonstrate solutions that can separate speech signals from mixture recordings.

E. Visser, M. Otsuka, T-W. Lee "A Spatio-Temporal Speech Enhancement Scheme for Robust Speech Recognition in Noisy Environments " Speech Communications, 2003.
G.-J. Jang and T.-W. Lee A probabilistic approach to single channel source separation Advances in Neural Information Processing Systems 15, MIT Press, Cambridge, 2003.
O.-W. Kwon, K.-L. Chan, T.-W. Lee, Speech Feature Analysis Using Variational Bayesian PCA IEEE Signal Processing Letters, Vol , no , 2003.