Speech Separation and Comprehension in Complex Acoustic Environments |
||||||||
Integration of Different Machine ApproachesChairs: DeLiang Wang, Dan Ellis Participants: Lawrence Saul, Lucas Parra, Les Atlas (NB: Talk titles link to presentation slides.) Session overview:
Lucas Parra Acoustic Source Separation with Microphone ArraysBlind Source Separation (BSS) has received much attention in the context of acoustic mixtures. Most algorithms that separate convolutive mixtures exploit the spatial selectivity of an array of microphones. It is natural therefore to put convolutive BSS into the context of traditional beamforming. This talk will review different optimization criteria, including statistical independence, aka ICA. The talk will be biased toward frequency domain implementations as those tend to be the most efficient. Only algorithms that have shown significant results (i.e. 10-20dB improvement) on real-world applications will be discussed. Below are some relevant papers, which can be downloaded from http://newton.bme.columbia.edu/~lparra/publish/
Lawrence Saul Machine Learning and Auditory Scene AnalysisHow can we integrate the latest advances in machine learning into systems for auditory scene analysis and speech separation? The main challenge is to develop representations of the acoustic signal that can be analyzed by statistical learning algorithms. In this talk, I will describe some recently proposed models in machine learning for dimensionality reduction and sequence analysis and discuss their application to problems in multiple f_0 tracking, speaker separation, and acoustic modeling. Les Atlas, University of Washington Modulation Spectral Filtering: A New Tool for Acoustic Signal SeparationThere is substantial evidence that commonality of modulations rates or frequencies provide an important cue for perceptual grouping of multiple sound sources for both monaural and binaural perception. Unfortunately, this modulation concept has previously had little, if any, quantitative foundation. The elementary notions of frequency in a Fourier sense and concepts of linear time-invariant filtering are very well defined. It is thus reasonable to expect analogous properties for modulation frequency representations and modulation filters [1]. A correct and substantive definition of modulation frequency filtering, with the suppression and distortion-free performance one normally expects in a filter, could be a key ingredient of sound separation systems. A time frequency approach can provide a start of a careful definition of modulation frequency. However, the conventional assumption of an incoherently detected real and non-negative modulation envelope, as used by essentially all researchers, is incomplete [2]. Correspondingly, with a more accurate coherent modulation detection foundation, there is the potential to satisfy superposition and other properties in modulation filtering. Well-defined modulation spectra can then be viewed as a new and useful dimension to filter in, complementing and potentially augmenting existing separation technologies. Demonstrations will include single-channel talker and music source separation. Remaining challenges will be discussed. References
Dan Ellis, Columbia University Integrating CASA information with other signal separation techniquesComputational Auditory Scene Analysis (CASA) has been broadly used to refer to computer systems that try to duplicate the human ability to organize complex sound scenes into individual sources by directly modeling what is understood of how the auditory system achieves this task. In practice, this is mainly associated with a collection of "CASA features" that attempt to capture the cues to sound organization identified by experimental psychologists -- continuity, common onset, common periodicity, common modulation, and conformity to well-known patterns. On the other hand, a number of alternative approaches including Independent Component Analysis start from a purely theoretical analysis of the problem and make no claims of perceptual relevance. This talk will attempt to clarify the distinctions and links between these two approaches, and suggest ways in which CASA cues can be successfully integrated into more rigorous signal separation algorithms. References
|