What Kinds of Knowledge about Humans (behavior and physiology) are useful for designing systems
Chair: Alain de Cheveigné
Participants: Hynek Hermansky, Douglas Brungart, Rhodri Cusack
(NB: Talk titles link to presentation slides.)
Hynek Hermansky
We show that engineering optimizations in automatic recognition
of speech could yield processing
modules that are consistent with psychophysics and physiology
of hearing, supporting the notion that speech evolved to be heard.
Reference:
Douglas S. Brungart and Brian D. Simpson
Air Force Research Laboratory, WPAFB, OH
Although many researchers have shown that multitalker speech
intelligibility can be improved by spatially separating the apparent
locations of the competing speech signals, relatively little effort has
been made to identify the spatial configurations that optimize
performance in cocktail party listening tasks. In this presentation, we
review some of the factors that have been shown to influence performance
in multitalker speech displays, and describe the development of a spatial
configuration that we have found to provide optimal performance in a
listening environment with seven simultaneous talkers. We also present
the results of an experiment that examined two other factors than can
influence performance in multitalker listening tasks: the amount of
information the listener has about the location of the target talker, and
the use of real-time headtracking to update the locations of the
competing talkers in response to the listener's head movements. These
results show that headtracking can improve performance in cocktail-party listening
tasks, but only when the competing talkers are spaced relatively far
apart and the target talker location remains constant over long periods
of time.
Reference:
- Douglas Brungart, Brian Simpson, Alexander Kordik, Richard McKinley,
The Impact of Headtracking on Intelligibility in a Multitalker Display
- Douglas Brungart, Brian Simpson,
Optimizing the Spatial Configuration of a Seven-Talker Speech Display, Proc. ICAD-2003, Boston.
- Mark Ericson, Douglas Brungart, Brian Simpson,
Factors that Influence Intelligibility in Multitalker Displays, Int. J. Aviation Psych 14(3), 313-334, 2004.
Rhodri Cusack
Without structure, the vast array of information arriving from our senses would be bewildering. Fortunately, there are many statistical regularities in sounds (and visual input) from the world, and these are used by sensory systems to perceptually organize the scene into streams or objects, from amongst which we then select. This presentation will be in two parts. In the first, I briefly review recent evidence that perceptual organization is affected by the current focus of attention. What constitutes a stream or object is not defined by the physical stimulus, but is dependent on the current task requirements. I argue that this may not be due to a capacity limitation in perceptual organization, but is an adaptive response to complex listening environments, and propose the hierarchical decomposition model. In the second part, I will describe a model of the neural architecture underlying perceptual organization, and suggest how it might interact with mechanisms responsible for attention. A large body of evidence supports the model. It provides an architecture that is highly suitable for machine implementations of auditory scene analysis.
Relvant Publications
- Cusack, R., Deeks, J., Aikman, G. & Carlyon, R.P. (2004) Effects of focussing attention in space, frequency and time on the build up and maintenance of auditory stream segregation. Journal of Experimental Psychology: Human Perception and Performance 30(4):643-56.
- Carlyon, R. P., Cusack, R. Foxton, J. & Robertson, I.H. (2001) Effects of attention and unilateral neglect on auditory stream segregation, Journal of Experimental Psychology: Human Perception and Performance. 27, 115-127
|