Speech Separation and Comprehension in Complex Acoustic Environments

Speech Separation and Comprehension in Complex Acoustic Environments
Thu Nov 4 - Sun Nov 7, 2004
Montreal, Quebec
sponsored by the Air Force Office of Scientific Research and the National Science Foundation

Description

Speech against the background of multiple, different speech sources, such as crowd noise or even a single talker in a reverberant environment, has been recognized as the acoustic setting perhaps the most detrimental to verbal communication. Psychological and audiological data over the last 25 years have succeeded in better defining the processes necessary for a human listener to perform this difficult task. The same data have also motivated the development of models that have been able to better and better predict and explain human performance in a multi-talker setting. However, since the data gave indication of the limits of performance under these difficult listening conditions, it became clear that significant improvement of speech understanding in speech noise is likely to be brought about only by yet-to-be-developed devices that execute automatic separation of speech sources, filter out the unwanted sources, and enhance the target source. The last 10-15 years have allowed us to witness an unprecedented rush toward the development of different computational schemes aimed at achieving this goal. A cursory survey of computational separation of speech from other acoustic signals, mainly other speech, strongly suggests that the current state of the whole field is in a flux: there are a number of initiatives, each based on an even larger number of theories, models, and assumptions. It seems that, despite commendable efforts and achievements by many researchers, it is not clear where the field is going. One possible problem is that investigators working in separate areas seldom interact.

In order to foster such an interaction, we organized an interdisciplinary international workshop last year, to our knowledge the first of its kind. We invited experimental psychologists, neuroscientists, and computer scientists working on different issues of speech separation problem and using different techniques. The workshop, held in Montreal, Canada, over the weekend of October 31 to November 2, 2003, was sponsored by the National Science Foundations Directorate for Computer and Information Science and Engineering, Division of Intelligent Information Systems, Program of Human Language and Communication. It was attended by twenty active presenters who constituted a representative sample of experts of speech separation researchers from the various fields. Interspersed with presentations of the experts work, there were periods of planned discussion that stimulated an intensive exchange of ideas and points of view. A book with contributions by all presenters, published by Kluwer Academic Publishers, is just about to appear (publication date: October 15, 2004). But we must also notice that the field is rapidly changing, as we speak. For one, the question of how the separated signals are interpreted, by humans and machines, is being increasingly thrust to the forefront. To keep abreast of all these changes, it will be necessary for experts working on speech separation and comprehension by humans and machines to meet again at another workshop to exchange data and ideas.

Similarly to the first, this workshop will be small: it is our desire to give each invited attendee the opportunity to present his or her views, as a presenter, a discussant, or both. The format will be that of presentations followed by both topical and general discussion periods. As a departure from last years format, we want to open the workshop to a select group of ten graduate students and postdoctoral attendees who will display their posters that will be accessible throughout the duration of the workshop. These young participants will be also encouraged to take an active part in the discussion periods.

In addition to the established and young presenters, representatives from U.S. funding agencies and funding agencies from Europe, Canada, and Japan have also been invited as observers. The reason for inviting these representatives is to stimulate interest in the interdisciplinary problem area of speech separation and comprehension.

Objectives of the workshop

a desire to assess where the field of speech separation and comprehension currently stands, through talks by experts presenting a general panorama and recent findings in sensory and cognitive psychology, neurobiology, and computer science;

a desire to promote an open interdisciplinary exchange of views between the invited experts through discussion periods focused on designated problem areas;

a desire to transfer expert knowledge to a new generation of young scientists by opening the workshop to a group of advanced graduate students and persons having recently completed their doctorates.

Topics to be covered

Separation of, and selective attention to, multiple speech sources in complex acoustic environments by human listeners

Comprehension and perceptual learning of separated speech sources by human listeners

Informational masking and unmasking of multiple speech sources

Separation of spatially distributed multiple speech sources by humans and machines

Neurophysiology and neuroimaging of speech separation and comprehension

Computational auditory scene analysis

Statistical (blind) computational source separation

Machine learning at the service of computational speech separation

Organizing Committee:

Pierre Divenyi, EBIRE, Martinez, CA (Chair)
Nathaniel Durlach, Boston University
DeLiang Wang, Ohio State University
Dan Ellis, Columbia University