SKEWVIEW - Tool to visualize timing skew between files
skewview is a Matlab script that can be used to visualize the timing skew between two sound files. It breaks both files up into a set of short pieces (by default 4 seconds long) performs a normalized cross-correlation between corresponding pieces, then plots the time of the peak of this correlation as a function of time within the file. If the files contain versions of the same signal, the peak of the correlation will usually indicate the relative timing skew (delay) between the two files. This can be used to check for such a skew/delay.
skewview supports a range of input sound file formats, including WAV and FLAC (the latter provided via an external flac binary).
Contents
Example usage
In the code below, we plot the timing skew between the two excerpted files, 20110221_1452+60.xr_lre.xxx.clean.flac (which is a clean source signal), and 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac (which is the signal recorded after transmission across a radio link). The piecewise-constant time skew between the recordings is clearly shown. The (optional) later arguments in this case set the start and end times (in seconds) for the analysis.
skewview('20110221_1452+60.xr_lre.xxx.clean.flac','20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac','-start',0,'-end',60);
Reading ref 20110221_1452+60.xr_lre.xxx.clean.flac ... Reading targ 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac ... Making initial coarse alignment with initialdownsamp=4... New initial delay = 0.555 sec Calculating short-time cross-correlation... Plotting... +++ SkewView v0.9 for 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac ref=20110221_1452+60.xr_lre.xxx.clean.flac times start=0 end=60 win=10 hop=2 maxlag=10 peakth=0.2 Lin fit stats: sd = 0.040035 prop pts = 0.280 Lin fit: t_targ = (1 - 0.021935) t_ref + 0.706 MEDIAN LAG = 0.071 s, STDDEV = 0.266 s, ABVTHRESH = 0.960
Showing Spectrograms
skewview can also display spectrograms of the two audio signals, aligned to the cross correlations, for further diagnosis.
skewview 20110221_1452+60.xr_lre.xxx.clean.flac 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac -dur 60 -plotsgrams 1 -hop 0.2
Reading ref 20110221_1452+60.xr_lre.xxx.clean.flac ... Reading targ 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac ... Making initial coarse alignment with initialdownsamp=4... New initial delay = 0.555 sec Calculating short-time cross-correlation... Plotting... +++ SkewView v0.9 for 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac ref=20110221_1452+60.xr_lre.xxx.clean.flac times start=0 end=60 win=10 hop=0.2 maxlag=10 peakth=0.2 Lin fit stats: sd = 0.039977 prop pts = 0.302 Lin fit: t_targ = (1 - 0.022821) t_ref + 0.737 MEDIAN LAG = 0.071 s, STDDEV = 0.262 s, ABVTHRESH = 0.940
Writing aligned outputs
skewview will attempt to fit a linear relationship between the reference track timebase and the best timing skew between the two tracks. When two files are related by a simple delay and possibly a small resampling factor (clock drift), this fit will indicate a simple trim-and-resampling operation that can be used to modify the target to be (almost) correctly temporally aligned to the reference. You can get it to write this output with -alignout outfilename. Note that if the clock drift (resampling) is significant (e.g., 0.1% or more), you need to limit the size of the correlation window to avoid significant drift within a single window from blurring the cross-correlation peaks. By the same token, as the drift gets smaller, you can use longer windows and get better alignments. So you may want to run the alignment multiple times to get increasingly good alignments.
Note also that you can manually adjust the green "handles" at each end of the green best-fit line to improve the alignment. Skewview then reports the best fit and the corresponding sox command to generate an aligned output.
% warwick-mix.wav is a full mix, warwick-aca.wav is an acapella % version that includes about 0.6% clock drift % First pass subplot(221) skewview warwick-mix.wav warwick-aca.wav -win 4 -hop 0.5 -alignout warwick-aca-to-mix.wav % Second pass - double window length subplot(222) skewview warwick-mix.wav warwick-aca-to-mix.wav -win 8 -hop 1 -alignout warwick-aca-to-mix-2.wav % Third pass subplot(223) skewview warwick-mix.wav warwick-aca-to-mix-2.wav -win 8 -hop 1 -alignout warwick-aca-to-mix-3.wav % By now, the alignment is pretty good subplot(224) skewview warwick-mix.wav warwick-aca-to-mix-3.wav -win 8 -hop 1 [dm,sr] = wavread('warwick-mix.wav'); [da,sr] = wavread('warwick-aca-to-mix-3.wav'); ll = min([20*sr, length(dm), length(da)]); soundsc([dm(1:ll),10*da(1:ll)], sr) % good synchronization
Reading ref warwick-mix.wav ... Reading targ warwick-aca.wav ... Making initial coarse alignment with initialdownsamp=4... New initial delay = -9.919 sec Calculating short-time cross-correlation... Plotting... +++ SkewView v0.9 for warwick-aca.wav ref=warwick-mix.wav times start=0 end=0 win=4 hop=0.5 maxlag=4 peakth=0.2 Lin fit stats: sd = 0.009417 prop pts = 0.290 Lin fit: t_targ = (1 - 0.008521) t_ref - 9.693 MEDIAN LAG = -9.977 s, STDDEV = 0.082 s, ABVTHRESH = 1.000 Reading ref warwick-mix.wav ... Reading targ warwick-aca-to-mix.wav ... Making initial coarse alignment with initialdownsamp=4... New initial delay = 0.028 sec Calculating short-time cross-correlation... Plotting... +++ SkewView v0.9 for warwick-aca-to-mix.wav ref=warwick-mix.wav times start=0 end=0 win=8 hop=1 maxlag=8 peakth=0.2 Lin fit stats: sd = 0.004202 prop pts = 0.340 Lin fit: t_targ = (1 + 0.002261) t_ref - 0.043 MEDIAN LAG = 0.030 s, STDDEV = 1.488 s, ABVTHRESH = 0.943 Reading ref warwick-mix.wav ... Reading targ warwick-aca-to-mix-2.wav ... Making initial coarse alignment with initialdownsamp=4... New initial delay = 0.009 sec Calculating short-time cross-correlation... Plotting... +++ SkewView v0.9 for warwick-aca-to-mix-2.wav ref=warwick-mix.wav times start=0 end=0 win=8 hop=1 maxlag=8 peakth=0.2 Lin fit stats: sd = 0.000515 prop pts = 0.302 Lin fit: t_targ = (1 + 0.000364) t_ref - 0.017 MEDIAN LAG = -0.006 s, STDDEV = 0.003 s, ABVTHRESH = 0.849 Reading ref warwick-mix.wav ... Reading targ warwick-aca-to-mix-3.wav ... Making initial coarse alignment with initialdownsamp=4... New initial delay = 0.002 sec Calculating short-time cross-correlation... Plotting... +++ SkewView v0.9 for warwick-aca-to-mix-3.wav ref=warwick-mix.wav times start=0 end=0 win=8 hop=1 maxlag=8 peakth=0.2 Lin fit stats: sd = 0.000068 prop pts = 0.717 Lin fit: t_targ = (1 + 0.000041) t_ref + 0.000 MEDIAN LAG = 0.002 s, STDDEV = 0.000 s, ABVTHRESH = 0.925
Optional arguments
Behavior is controlled by optional arguments specified as param/value pairs, detailed below. From v0.86 onwards, if the first argument starts with a "-", it is assumed all arguments are in "-parameter value" format, otherwise the first two arguments are taken as reference and target sound file names.
skewview -help
skewview v0.9 of 20140219 -ref reference audio wavfile () -targ target audio wavfile () -start start at this point in files (0) -end end analysis at this point (0) -dur limit analysis to this much audio (-1) -win xcorr analysis window in sec (10) -hop hop between success windows in sec (2) -maxlag largest lag to consider (dlft win) (0) -peakth threshold of max to count as peak (0.2) -initialdelay center around this t_targ (NaN) -estinitialdelay estimate targ-ref by global xcorr (1) -initialdownsamp downsample by before initial xcorr (4) -samplerate resample to this before comparison (0) -fitthresh controls inclusion of outliers in lin fit (2) -alignout name for time-warped target audio output () -pngout name for PNG-format screen dump () -textout name for text-format <time skew> pairs () -corrout include normalized corr vals in textout (0) -minspread minimum spread of Y axis (sec) (0.1) -plotsgrams add specgram plots above xcorr (0) -disp enable (disable) graphic display (1)
Compiled target usage
Invoking the compiled target is the same as above, except without the punctuation e.g.
./run_skewview_prj.sh 20110221_1452+60.xr_lre.xxx.clean.flac 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac -start 0 -end 60
Installation
This package has been compiled for several targets using the Matlab compiler. You will also need to download and install the Matlab Compiler Runtime (MCR) Installer. Please see the table below:
Architecture | Compiled package | MCR Installer |
---|---|---|
32 bit Linux | skewview_GLNX86.zip | Linux MCR Installer |
64 bit Linux | skewview_GLNXA64.zip | Linux 64 bit MCR Installer |
64 bit MacOS | skewview_MACI64.zip | MACI64 MCR Installer |
The original Matlab code used to build this compiled target is available at
<http://labrosa.ee.columbia.edu/projects/skewview/>
All sources are in the package skewview-v0.90.zip.
Feel free to contact me with any problems.
Changelog
% 2014-02-19 v0.90 - changed final resampling to work in parts % (reading and writing MP3 input and output in % parts using popen()) to speed up -alignout % writing. Saves maybe ~15% on 8GB Macbook for % 75 minute, 44.1 kHz stereo file (3:30 -> 3:00). % % 2014-02-05 v0.89 - added 'corrout' option to make 'textout' % add actual normalized xcorr peak to <time, % skew> pairs. % % 2014-01-27 v0.88 - Fixed the rare bug in new_stxcorr that % crashed if final block had only one frame. % % 2014-01-23 v0.87 - Changed when resampled alignout file is % actually written: was written whenever slope % was changed, now only written when plot is % closed (or immediately if no plot). % % 2014-01-01 v0.86 - now short-time cross-correlation can have % lags much larger than then actual window, and % the correlations are always between % fully-populated windows. % - better memory management during st_xcorr. % % 2013-12-19 v0.85 - cleaned up calculation of best skew/offset % - initial delay is estimated before chopping % durs to shorter of pair % - maxlag now defaults to same as win % - fixed callback to rewrite alignout after adjustment % - added documentation for -alignout usage % % 2013-07-09 v0.84 - added -plotsgrams option to plot synchronized % spectrograms. Changes to find_skew algo. % Added -dur as alternative to -end. % Reports "lin fit stats" including SD relative % to best linear fit over selected points % only. % % 2013-07-02 v0.83 - resampling/trimming now done internally when % -alignout filename is specified. % - minor changes to audioread to handle ~, % pathless files. % - default is now -estinitialdelay 1 % % 2013-05-15 v0.82 - fixed bug where maxlag > win caused crash. % - took out check where mac version used slower xcorr. % % 2013-05-14 v0.81 - fixed bug where perfect time alignment caused crash % - -initialxcorr renamed -estinitialdelay % - fixed bug that gave incorrect offsets when % new initialdelay was positive (with -estinitialdelay 1) % % 2013-05-05 v0.8 - Now STDDEV is reported relative to the best-fit % line, so it can be very small even for tracks % with a significant (but systematic) clock % skew. % - New flag -initialxcorr 1 will estimate a % global time skew for the whole track, % obviating the need for -initialdelay. % % 2013-04-08 v0.75 Added -minspread option to force a minimum % y-axis range (rather than having it collapse to % very small range for near-synchronous fits). % % 2013-03-07 v0.74 Interactive fixup of best-fit line! Grab % points at end to adjust the line; reports new % lin fit parameters & sox command on mouse-up. % % 2013-01-24 v0.73 -alignout now works for both advance (via trim) % and delay of output. Shell script now handles % filenames with spaces and special characters. % Linear fitting in linfit.m now also excludes % points with the top and bottom 10% of slopes to % adjacent points (i.e., aiming for % middle-80%-median slope). % % 2012-09-26 v0.72 Added -alignout, which causes it to report a % sox command that generates a version of TARG % that aligns to REF. % % 2012-09-24 v0.71 Added linear fit to report best offset and skew. % Optimized calculcation of cross-correlation. % % 2011-09-09 v0.7 Incorporated new audioread to allow efficient % access to parts of very large files; added help message in program. % % 2011-09-06 v0.6 Modified audio reading code to better handle % large/high SR files. % % 2011-08-03 v0.5 Added version number in text report output. % % 2011-07-19 v0.4 Added -textout option to allow raw text file dump % of local skew times. % % 2011-05-03 v0.3 Added -initialdelay to handle files with large % default skews, and -samplerate to specify an optional lower % sampling rate at which to perform analysis, to accommodate much % larger maximum lags and total file durations without exhausting % memory. % % 2011-04-19 v0.2 Added text report of mean and SD of skew, and % multiple command-line options to control the various internal % parameters % % 2011-03-15 v0.1 Initial release %
Acknowledgment
This work was supported by DARPA under the RATS program via a subcontract from the SRI-led team SCENIC. My work was on behalf of ICSI.
Last updated: $Date: 2011/08/04 01:34:37 $ Dan Ellis dpwe@ee.columbia.edu