RENOISER - Utility to decompose and recompose noisy speech files

renoiser is a Matlab script that can be used to separate out the linear component of a clean file in a filtered, noisy mixture. It can then be used to recompose the mixture with the target at a modified relative level, or to introduce a new target, filtered to resemble the original, at a specified SNR.

renoiser defines a number of concepts: CLEAN is original, clean speech, TARGET is a clean signal that is noise-free but has been filtered by a channel (defined by an FIR filter FILTER); NOISE is an additional noise component, and MIX is a combination of TARGET and NOISE.

Signal decomposition
Resynthesis from clean speech
Resynthesis at new SNR from extracted target
Command line options
Bulk processing
Direct functions
Installation
Notes
Changelog
Acknowledgment

Signal decomposition

The first usage example is to break a MIX into a TARGET (matching a CLEAN) and a NOISE. The command below performs the decomposition, and saves out the components. The input MIX is the output NOISE and TARGET summed together, and the output TARGET is approximately the output FILTER applied to the input CLEAN (only approximately because the estimated filter can be slowly time-varying, and there is an internal timing skew adjustment that would not be preserved). The renoiser command is designed to be called from the command line, so it reads and writes all data to/from sound files:

renoiser -mix arabic_400mhz.wav -clean arabic_source.wav -noiseout noise.wav -targetout targ.wav -filterout filt.wav
% actually look at the results in Matlab
[mix,sr] = wavread('arabic_400mhz.wav');
[clean,sr] = wavread('arabic_source.wav');
[noise,sr] = wavread('noise.wav');
[targ,sr] = wavread('targ.wav');
[filt,sr] = wavread('filt.wav');
nfft = 512;
subplot(411)
specgram(mix,nfft,sr);
caxis(max(caxis)+[-80 0]);
cax = caxis;
axis([0 20 0 4000]);
title('Mix')
subplot(412)
specgram(targ,nfft,sr);
caxis(cax);
axis([0 20 0 4000]);
title('Target speech')
subplot(413)
specgram(noise,nfft,sr);
caxis(cax);
axis([0 20 0 4000]);
title('Residual noise')
subplot(414)
plot(filt);
title('Coupling filter FIR impulse response');
% filt is just the impulse response of (an example of) the inferred filter

++++++ renoiser v0.2 ++++++
Reading CLEAN from arabic_source.wav ...
Reading MIX from arabic_400mhz.wav ...
Identifying CLEAN in MIX...
skewmaxsec=2.5 Tfilt=0.04
Mix delay= -0.235 s
FILTER saved to filt.wav
NOISE saved to noise.wav
TARGET saved to targ.wav
Input mix SNR= 14.09 dB

ans =

     []

Resynthesis from clean speech

The command below reconstructs a new signal using the components separated above. The FILTER and NOISE extracted in the first invocation are used to build a new MIX at the specified SNR (as determined by P.56 active level estimation, thanks to Mike Brookes' Voicebox toolbox):

renoiser -clean arabic_source.wav -filter filt.wav -noise noise.wav -mixout mix.wav -SNR 6.0

++++++ renoiser v0.2 ++++++
Reading CLEAN from arabic_source.wav ...
Reading FILTER from filt.wav ...
Reading NOISE from noise.wav ...
Filtering CLEAN to produce target...
Creating new output mix at SNR 6 dB ...
MIX saved to mix.wav

ans =

     []

Resynthesis at new SNR from extracted target

Note that any nonlinear distortion components related to the original CLEAN will remain in NOISE. In order to have these line up as well as possible in the reconstructed mix, it's better to use the TARGET output, in which case FILTER and CLEAN are not needed:

renoiser -target targ.wav -noise noise.wav -mixout mix.wav -SNR 0.0
% The analysis and this last style of recombination can be done in
% a single step:
renoiser -mix arabic_400mhz.wav -clean arabic_source.wav -mixout mix.wav -SNR 0.0
% However, since identifying CLEAN in MIX is relatively computationally
% expensive, it's often preferable to break these steps apart.

++++++ renoiser v0.2 ++++++
No FILTER - just copying CLEAN to TARGET
Reading NOISE from noise.wav ...
Reading TARGET from targ.wav ...
Creating new output mix at SNR 0 dB ...
MIX saved to mix.wav

ans =

     []

++++++ renoiser v0.2 ++++++
Reading CLEAN from arabic_source.wav ...
Reading MIX from arabic_400mhz.wav ...
Identifying CLEAN in MIX...
skewmaxsec=2.5 Tfilt=0.04
Mix delay= -0.235 s
Input mix SNR= 14.09 dB
Creating new output mix at SNR 0 dB ...
MIX saved to mix.wav

ans =

     []

Command line options

All parameters to renoiser are specified in the command line via "-optionname value" pairs. The full set of options is:

   -clean <filename>  The name of the clean (reference) sound file
   -target <filename> Clean, channel-filtered speech to insert
   -targetout <filename>  Where to save extracted target speech
   -mix <filename>    Input mixture of noise and channel-filtered target
   -mixout <filename> Where to write recombined target + noise
   -noise <filename>  Input background noise signal
   -noiseout <filename> Where to write the separated noise residual
   -filter <filename> Input channel impulse response
   -filterout <filename> Where to save the estimated channel response
   -start <time_secs> Start processing at this time in the files
   -end <time_secs>   Finish processing at this point in the files
   -SNR <val_dB>      Target signal-to-noise ratio when mixing
   -disp <bool>       1 to plot spectrograms, 0 for no graphics (default)
   -targetsr <rate_Hz>  If specified, resample signals to this rate
   -cleanlist <listfile>  A list of filenames to be taken as clean
   -mixoutdir <dirname>   Output mixes will be written into this directory
   -laundernoise <win_sec> If >0, noise is LPC an-synth'd over this win
   -noisefloor <level_dB> Stabilize CLEAN by adding noise at SNR (-60)
   -fshift <freq_Hz>   Frequency shift (+ or -, e.g. SSB) for output (0)
   -checkfshift <bool> Whether to check for frequency shift on analysis (0)

Bulk processing

You can use the -cleanlist and -mixoutdir options to "renoise" a collection of files in a single invocation. See create_wsj.html for an example of copying channels from clean examples, then applying it to new signals.

Direct functions

The renoiser script is mainly concerned with handling file input and output, and in deciding which functions (separation, remixing, etc.) to perform. For use within Matlab, you can access the following functions to directly perform these functions:

[noise, targ, filt, SNR] = find_in_mix(mix, clean, sr) - takes waveform vectors for MIX and CLEAN, and extracts NOISE, TARG, and FILT vectors, as well as returning effective SNR. find_in_mix relies on find_skew to make a rough alignment between MIX and CLEAN, then decomp_lin_win and decomp_lin, which further uses whiten, to perform the actual decomposition.
[mix] = mix_noise(targ, noise, sr, SNR) - uses activlev (from Mike Brookes' VoiceBox) to measure "active levels" of TARG and NOISE, then mixes them to achieve the specified final SNR. NOISE is replicated (looped, with crossfade) if necessary.

Installation

This package has been compiled for several targets using the Matlab compiler. You will also need to download and install the Matlab Compiler Runtime (MCR) Installer. Please see the table below:

Architecture Compiled package MCR Installer

32 bit Linux renoiser_GLNX86.zip Linux MCR Installer

64 bit Linux renoiser_GLNXA64.zip Linux 64 bit MCR Installer

64 bit MacOS renoiser_MACI64.zip MACI64 MCR Installer

Architecture	Compiled package	MCR Installer
32 bit Linux	renoiser_GLNX86.zip	Linux MCR Installer
64 bit Linux	renoiser_GLNXA64.zip	Linux 64 bit MCR Installer
64 bit MacOS	renoiser_MACI64.zip	MACI64 MCR Installer

The original Matlab code used to build this compiled target is available at

 <http://labrosa.ee.columbia.edu/projects/renoiser/>

All sources are in the package renoiser.zip.

Feel free to contact me with any problems.

Notes

audioread is able to read a wide range of sound file types, but relies on a number of other packages and/or support functions being installed. Most obscure of these is ReadSound, a MEX wrapper I wrote for the dpwelib sound file interface. This, along with an installation of shorten, is required to read the *.wv2 files of the original WSJ distribution (among several other LDC data sets).

Changelog

v0.1 2011-02-11

v0.2 2011-08-03 Added version number to text output

Acknowledgment

This work was supported by DARPA under the RATS program via a subcontract from the SRI-led team SCENIC. My work was on behalf of ICSI.

Last updated: $Date: 2011/08/04 01:33:55 $ Dan Ellis dpwe@ee.columbia.edu