Glimpsing speech
Martin Cooke
(Sheffield University)

When listeners are confronted with speech in everyday noise, they have available a range of potential strategies to recover speech from the mixture. One possibility is to segregate-then-recognise. Here, the segregation step determines those speech components which are likely to have originated from the same acoustic source, using, for example, principles of auditory organisation. This strategy has formed the basis for most computational models of speech segregation to date. The segregate-then-recognise approach places a significant burden on bottom-up grouping processes. However, an alternative strategy is to detect-then-recognise, in which the incoming noisy mixture is searched for regions – glimpses – which might represent fragments of speech. The subsequent stage of recognition is then based on the partial information provided by the glimpses. In the glimpsing approach, the burden falls on the presence on good top-down models of speech. This talk will evaluate the evidence for both types of strategy, and present the results of recent experiments which compare listeners' VCV identification performance with a computational model of glimpsing.

Relevant material:

Martin Cooke, A glimpsing model of speech perception, ICPhS 2003. http://www.dcs.shef.ac.uk/~martin/icphs_0817.doc
Jon Barker, Martin Cooke and Dan Ellis, Decoding speech in the presence of other sources, Accepted for Speech Communication. http://www.dcs.shef.ac.uk/~martin/barker_crac_2002.pdf
Martin Cooke, Phil Green, Ljubomir Josifovski and Ascension Vizinho (2001). Robust automatic speech recognition with missing and uncertain acoustic data, Speech Communication, 34, 267-285.