Sound, mixtures, and learning
Dan Ellis
(Columbia University)
In order for machines to approach the abilities of human listeners to extract information from sound, the machines must organize sounds in the same way, i.e. according to source. Extensive efforts to build functional computer models of what is understood of human source organization have proven disappointing, and led to an appreciation of the importance of high-level knowledge-based constraints to disambiguate mutually-corrupting overlapping sounds. This talk will focus on issues of how to capture this knowledge in representations such as the hidden Markov model (HMM), and how to use these models to infer the makeup of dense mixtures. Practical issues of tractability and evaluation will also be discussed.
Relevant material:
M. Reyes, D.P.W. Ellis, N. Jojic. "Subband audio modeling for single-channel
source separation" draft; paper in preparation. http://www.ee.columbia.edu/~dpwe/pubs/multibandsep.pdf
M. Cooke and D.P.W. Ellis (2001). "The auditory organization of speech
and other sources in listeners and computational models" Speech Communication,
vol. 35, no. 3-4, pp. 141-177. http://www.ee.columbia.edu/~dpwe/pubs/CookeE01-audorg.pdf
D.P.W. Ellis (1998). "Using knowledge to organize sound: The prediction-driven
approach to computational auditory scene analysis, and its application to
speech/nonspeech mixtures" Speech Communication special issue on Computational
Auditory Scene Analysis, M. Cooke & H. Okuno, eds., vol. 27, no. 3-4,
pp. 281-298. http://www.icsi.berkeley.edu/~dpwe/research/spcomcasa98/spcomcasa98.pdf