Recurrent timing nets for F0-based sound separation
Peter Cariani
(Harvard University/Massachusetts Ear and Eye Infirmary)

We have recently proposed recurrent timing nets that operate on temporal fine structure of inputs to build up and separate periodic signals with different fundamental periods. The motivation for developing such temporal processing strategies has come from our investigations of the temporal coding of pitch in the auditory system. In early stages of auditory processing, superabundant phase-locked temporal discharge patterns provide precise and robust frequency and phase information that subserves source localization and pitch perception. Such fine timing information may also provide a basis for formation and separation of auditory objects if neural architectures can be devised that can build up auditory images from temporal pattern invariances. Simple recurrent nets consist of arrays of coincidence elements fed by common input lines and delay loops of different recurrence times. Coincidence elements compare the direct, incoming signal with (delayed) signals arriving from the delay loop and adjust the signal that is outputted back into the delay loop by means of a processing rule. Early processing rules simply amplified the signal slightly when coincidences occurred; more recently linear error-correction rules have been employed that more gracefully handle analog signals with less distortion. The processing is related to running autocorrelation, comb filtering, and adaptive linear prediction. When a repeating temporal pattern (arbitrary periodic signal) is presented to the network, the pattern builds up most quickly in the delay loop whose recurrence time has the same duration as the pattern. When multiple repeating patterns with different periods are combined and presented to the network, the individual constituent patterns build up in the corresponding delay loops. The separated waveforms of the constituent signals can then be recovered in the different delay loops. We have tested 1-D timing networks using concurrently-presented pairs and triplets of synthetic 3-formant vowels with different fundamentals. Multichannel recurrent timing nets have also been implemented that process the output of a simulated auditory nerve front-end (24 fibers). In both cases nearly perfect separations are achieved when fundamentals are separated by more than 10%. The networks also enhance vowels in noise, improving S/N ratios by 4-10 dB. Such networks demonstrate in principle how auditory objects can be formed from temporal pattern coherences (phase coherences amongst sets of components). They also yield insights into how recurrent networks of spiking neurons whose synapses are transiently facilitated by temporal coincidences might support reverberating circuits that propagate temporal memory traces. Timing nets illustrate a new, general strategy for scene analysis that builds up correlational invariances rather than feature-based labeling, segregation and binding of channels.

Relevant material:

Peter Cariani, Recurrent timing nets for auditory scene analysis
Peter Cariani (2001). Neural timing nets. Neur. Networks, 14, 737-753.