RENOISER - Utility to decompose and recompose noisy speech files
renoiser is a Matlab script that can be used to separate out the linear component of a clean file in a filtered, noisy mixture. It can then be used to recompose the mixture with the target at a modified relative level, or to introduce a new target, filtered to resemble the original, at a specified SNR.
renoiser defines a number of concepts: CLEAN is original, clean speech, TARGET is a clean signal that is noise-free but has been filtered by a channel (defined by an FIR filter FILTER); NOISE is an additional noise component, and MIX is a combination of TARGET and NOISE.
Contents
Signal decomposition
The first usage example is to break a MIX into a TARGET (matching a CLEAN) and a NOISE. The command below performs the decomposition, and saves out the components. The input MIX is the output NOISE and TARGET summed together, and the output TARGET is approximately the output FILTER applied to the input CLEAN (only approximately because the estimated filter can be slowly time-varying, and there is an internal timing skew adjustment that would not be preserved). The renoiser command is designed to be called from the command line, so it reads and writes all data to/from sound files:
renoiser -mix arabic_400mhz.wav -clean arabic_source.wav -noiseout noise.wav -targetout targ.wav -filterout filt.wav % actually look at the results in Matlab [mix,sr] = wavread('arabic_400mhz.wav'); [clean,sr] = wavread('arabic_source.wav'); [noise,sr] = wavread('noise.wav'); [targ,sr] = wavread('targ.wav'); [filt,sr] = wavread('filt.wav'); nfft = 512; subplot(411) specgram(mix,nfft,sr); caxis(max(caxis)+[-80 0]); cax = caxis; axis([0 20 0 4000]); title('Mix') subplot(412) specgram(targ,nfft,sr); caxis(cax); axis([0 20 0 4000]); title('Target speech') subplot(413) specgram(noise,nfft,sr); caxis(cax); axis([0 20 0 4000]); title('Residual noise') subplot(414) plot(filt); title('Coupling filter FIR impulse response'); % filt is just the impulse response of (an example of) the inferred filter
++++++ renoiser v0.2 ++++++ Reading CLEAN from arabic_source.wav ... Reading MIX from arabic_400mhz.wav ... Identifying CLEAN in MIX... skewmaxsec=2.5 Tfilt=0.04 Mix delay= -0.235 s FILTER saved to filt.wav NOISE saved to noise.wav TARGET saved to targ.wav Input mix SNR= 14.09 dB ans = []
Resynthesis from clean speech
The command below reconstructs a new signal using the components separated above. The FILTER and NOISE extracted in the first invocation are used to build a new MIX at the specified SNR (as determined by P.56 active level estimation, thanks to Mike Brookes' Voicebox toolbox):
renoiser -clean arabic_source.wav -filter filt.wav -noise noise.wav -mixout mix.wav -SNR 6.0
++++++ renoiser v0.2 ++++++ Reading CLEAN from arabic_source.wav ... Reading FILTER from filt.wav ... Reading NOISE from noise.wav ... Filtering CLEAN to produce target... Creating new output mix at SNR 6 dB ... MIX saved to mix.wav ans = []
Resynthesis at new SNR from extracted target
Note that any nonlinear distortion components related to the original CLEAN will remain in NOISE. In order to have these line up as well as possible in the reconstructed mix, it's better to use the TARGET output, in which case FILTER and CLEAN are not needed:
renoiser -target targ.wav -noise noise.wav -mixout mix.wav -SNR 0.0 % The analysis and this last style of recombination can be done in % a single step: renoiser -mix arabic_400mhz.wav -clean arabic_source.wav -mixout mix.wav -SNR 0.0 % However, since identifying CLEAN in MIX is relatively computationally % expensive, it's often preferable to break these steps apart.
++++++ renoiser v0.2 ++++++ No FILTER - just copying CLEAN to TARGET Reading NOISE from noise.wav ... Reading TARGET from targ.wav ... Creating new output mix at SNR 0 dB ... MIX saved to mix.wav ans = [] ++++++ renoiser v0.2 ++++++ Reading CLEAN from arabic_source.wav ... Reading MIX from arabic_400mhz.wav ... Identifying CLEAN in MIX... skewmaxsec=2.5 Tfilt=0.04 Mix delay= -0.235 s Input mix SNR= 14.09 dB Creating new output mix at SNR 0 dB ... MIX saved to mix.wav ans = []
Command line options
All parameters to renoiser are specified in the command line via "-optionname value" pairs. The full set of options is:
-clean <filename> The name of the clean (reference) sound file -target <filename> Clean, channel-filtered speech to insert -targetout <filename> Where to save extracted target speech -mix <filename> Input mixture of noise and channel-filtered target -mixout <filename> Where to write recombined target + noise -noise <filename> Input background noise signal -noiseout <filename> Where to write the separated noise residual -filter <filename> Input channel impulse response -filterout <filename> Where to save the estimated channel response -start <time_secs> Start processing at this time in the files -end <time_secs> Finish processing at this point in the files -SNR <val_dB> Target signal-to-noise ratio when mixing -disp <bool> 1 to plot spectrograms, 0 for no graphics (default) -targetsr <rate_Hz> If specified, resample signals to this rate -cleanlist <listfile> A list of filenames to be taken as clean -mixoutdir <dirname> Output mixes will be written into this directory -laundernoise <win_sec> If >0, noise is LPC an-synth'd over this win -noisefloor <level_dB> Stabilize CLEAN by adding noise at SNR (-60) -fshift <freq_Hz> Frequency shift (+ or -, e.g. SSB) for output (0) -checkfshift <bool> Whether to check for frequency shift on analysis (0)
Bulk processing
You can use the -cleanlist and -mixoutdir options to "renoise" a collection of files in a single invocation. See create_wsj.html for an example of copying channels from clean examples, then applying it to new signals.
Direct functions
The renoiser script is mainly concerned with handling file input and output, and in deciding which functions (separation, remixing, etc.) to perform. For use within Matlab, you can access the following functions to directly perform these functions:
- [noise, targ, filt, SNR] = find_in_mix(mix, clean, sr) - takes waveform vectors for MIX and CLEAN, and extracts NOISE, TARG, and FILT vectors, as well as returning effective SNR. find_in_mix relies on find_skew to make a rough alignment between MIX and CLEAN, then decomp_lin_win and decomp_lin, which further uses whiten, to perform the actual decomposition.
- [mix] = mix_noise(targ, noise, sr, SNR) - uses activlev (from Mike Brookes' VoiceBox) to measure "active levels" of TARG and NOISE, then mixes them to achieve the specified final SNR. NOISE is replicated (looped, with crossfade) if necessary.
Installation
This package has been compiled for several targets using the Matlab compiler. You will also need to download and install the Matlab Compiler Runtime (MCR) Installer. Please see the table below:
Architecture | Compiled package | MCR Installer |
---|---|---|
32 bit Linux | renoiser_GLNX86.zip | Linux MCR Installer |
64 bit Linux | renoiser_GLNXA64.zip | Linux 64 bit MCR Installer |
64 bit MacOS | renoiser_MACI64.zip | MACI64 MCR Installer |
The original Matlab code used to build this compiled target is available at
<http://labrosa.ee.columbia.edu/projects/renoiser/>
All sources are in the package renoiser.zip.
Feel free to contact me with any problems.
Notes
audioread is able to read a wide range of sound file types, but relies on a number of other packages and/or support functions being installed. Most obscure of these is ReadSound, a MEX wrapper I wrote for the dpwelib sound file interface. This, along with an installation of shorten, is required to read the *.wv2 files of the original WSJ distribution (among several other LDC data sets).
Changelog
v0.1 2011-02-11
v0.2 2011-08-03 Added version number to text output
Acknowledgment
This work was supported by DARPA under the RATS program via a subcontract from the SRI-led team SCENIC. My work was on behalf of ICSI.
Last updated: $Date: 2011/08/04 01:33:55 $ Dan Ellis dpwe@ee.columbia.edu