Speech Separation and Comprehension in Complex Acoustic Environments
What Kinds of Knowledge about Humans (behavior and physiology) are useful for designing systems
Chair: Alain de Cheveigné
(NB: Talk titles link to presentation slides.)
We show that engineering optimizations in automatic recognition of speech could yield processing modules that are consistent with psychophysics and physiology of hearing, supporting the notion that speech evolved to be heard.
Douglas S. Brungart and Brian D. Simpson
Although many researchers have shown that multitalker speech intelligibility can be improved by spatially separating the apparent locations of the competing speech signals, relatively little effort has been made to identify the spatial configurations that optimize performance in cocktail party listening tasks. In this presentation, we review some of the factors that have been shown to influence performance in multitalker speech displays, and describe the development of a spatial configuration that we have found to provide optimal performance in a listening environment with seven simultaneous talkers. We also present the results of an experiment that examined two other factors than can influence performance in multitalker listening tasks: the amount of information the listener has about the location of the target talker, and the use of real-time headtracking to update the locations of the competing talkers in response to the listener's head movements. These results show that headtracking can improve performance in cocktail-party listening tasks, but only when the competing talkers are spaced relatively far apart and the target talker location remains constant over long periods of time.
Without structure, the vast array of information arriving from our senses would be bewildering. Fortunately, there are many statistical regularities in sounds (and visual input) from the world, and these are used by sensory systems to perceptually organize the scene into streams or objects, from amongst which we then select. This presentation will be in two parts. In the first, I briefly review recent evidence that perceptual organization is affected by the current focus of attention. What constitutes a stream or object is not defined by the physical stimulus, but is dependent on the current task requirements. I argue that this may not be due to a capacity limitation in perceptual organization, but is an adaptive response to complex listening environments, and propose the hierarchical decomposition model. In the second part, I will describe a model of the neural architecture underlying perceptual organization, and suggest how it might interact with mechanisms responsible for attention. A large body of evidence supports the model. It provides an architecture that is highly suitable for machine implementations of auditory scene analysis.