On computational objectives of auditory scene analysis
DeLiang Wang
(Ohio State University)

What should be the computational goal of auditory scene analysis? This is a key issue to address in the Marrian information-processing framework. It is also an important question for researchers in computational auditory scene analysis (CASA) because it bears directly on how a CASA system should be evaluated. In this talk I will discuss different objectives used in CASA. To stimulate the discussion, I will put forward a proposal based on ideal time-frequency (T-F) binary masks, where a T-F unit in a mask is 1 if the target energy in the unit is greater than the interference energy and 0 otherwise. Example results that attempt to estimate ideal binary masks for speech segregation will be illustrated.

Roman N., Wang D.L., Brown G.J. (2003): "Speech segregation based on sound localization," Journal of the Acoustical Society of America, in press. (http://www.cis.ohio-state.edu/~niki/jasa_2003.pdf)
Hu G. and Wang D.L. (2003): "Monaural speech separation," Proceedings of NIPS-02.
(http://www.cis.ohio-state.edu/~dwang/papers/Hu-Wang.nips02.pdf)