Dan Ellis : Resources: Matlab:

Contents

Beat Tracking by Dynamic Programming

This page illustrates the use of the beat_* functions to implement a simple music audio beat tracker based on dynamic programming, as described in: D. Ellis "Beat Tracking by Dynamic Programming" J. New Music Research 36(1): 51-60, March 2007, DOI: 10.1080/09298210701653344.

% First, load an example sound file
wavfilename = 'train01.wav';
[d,sr] = wavread(wavfilename);

% Now, calculate the "onset strength" function as the sum of the
% differentiated and half-rectified energy in a Mel-scale
% time-frequency transform:
[onset_fn, osr, sgram, tt, ff] = beat_onset(d,sr);
% osr returns the sampling rate of the frames of onset_fn
% sgram returns the mel spectrogram as an array, with tt and ff
% being the time and frequency labels.

% The dynamic programming approach needs an estimate of global
% tempo, so we calculate one by autocorrelating the onset function,
% applying a bias window, and choosing the biggest peak.
% Optionally, it plots the windowed autocorrelation for us
subplot(211)
display = 1;
tempo = beat_tempo(onset_fn, osr, display);

% Now we can run the dynamic programming beat tracker
beats = beat_simple(onset_fn, osr, tempo);

% We have some helper functions: one to plot the beat times on top
% of the Mel spectrogram (which we saved from beat_onset)
subplot(212)
beat_plot(beats, '-r', tt, ff, sgram);

% And we can listen to the result; the system-found beats are
% marked by little bursts of white noise superimposed on the
% original:
beat_play(beats, d, sr);

% We can go straight from soundfile to beats with beat_track, which
% just provides a wrapper around the steps above:
beats = beat_track(wavfilename);
Warning: The playback thread did not start within one second. 

Ground truth

We can also read in ground truth for the mirex06 McKinney/Moelants tapping data:

tapfilename = 'train01.txt';
truth = beat_ground_truth(tapfilename);
% By default this only returns the subset of tap records that are
% consistentn with the most popular tempo.  Multiple sequences are
% returned in separate cells of a cell array; we can plot and
% listen to them too:
beat_plot(truth{1}, 'xb');
beat_play(truth{1}, d(1:10*sr), sr);

% We can also plot all the individual ground-truth taps in a
% "scatter" format:
subplot(211)
beat_gt_plot(truth)

Scoring results

We can score a beat track against a ground truth by counting how many true beats are missed (deletions), and how many system-generated beats don't correspond to true beats (insertions):

collar = 0.2; % accept a beat within +/-20% of the tempo period
verbose = 1;
[err,ins,del,tru,hh,dd] = beat_score(beats,truth{1},collar,verbose);
% In this case, only the first beat is 'wrong', because the human
% was late to pick up the beat.
% We can score against all the ..
length(truth)
% .. 35 different tap records for this tempo by passing them all to
% the scoring function:
[err,ins,del,tru,hh,dd] = beat_score(beats,truth,collar,verbose);
% We don't line up with all the human taps, but then no single beat
% track ever could because the humans have too much spread.
Overall error=   3.2% (   1 ins,    1 del,   63 true)

ans =

    40

Overall error=  25.1% ( 330 ins,   89 del, 2279 true)

Testing against a set of examples

We can evaluate the beat tracker against a whole set of examples:

dirname = 'mirex06examples';
files = dir(fullfile(dirname,'*.wav'));
for i = 1:length(files); fnames{i} = fullfile(dirname,files(i).name); end
beat_test(fnames);
% Overall average error rate is high, but it's highly variable
% across examples.
Error for train01 (tempo est=129.2, users=129.3 BPM, 35 true tracks): Overall error=  11.6% ( 159 ins,   84 del, 2130 true)
Error for train02 (tempo est=168.1, users= 83.9 BPM, 26 true tracks): Overall error= 113.3% (1129 ins,   13 del, 1016 true)
Error for train03 (tempo est=101.8, users=153.7 BPM, 26 true tracks): Overall error= 110.1% ( 740 ins, 1291 del, 1851 true)
Error for train04 (tempo est=127.6, users= 42.0 BPM, 28 true tracks): Overall error= 223.6% (1179 ins,    4 del,  538 true)
Error for train05 (tempo est= 68.4, users= 68.5 BPM, 34 true tracks): Overall error=  11.0% (  71 ins,   36 del, 1087 true)
Error for train06 (tempo est=164.1, users= 82.0 BPM, 25 true tracks): Overall error= 164.6% (1128 ins,   22 del,  920 true)
Error for train07 (tempo est=103.4, users= 56.5 BPM, 34 true tracks): Overall error= 143.4% (1035 ins,  214 del,  879 true)
Error for train08 (tempo est=147.7, users=147.8 BPM, 26 true tracks): Overall error=  21.5% ( 238 ins,  155 del, 1815 true)
Error for train09 (tempo est=128.4, users=127.7 BPM, 25 true tracks): Overall error=  17.0% ( 165 ins,   88 del, 1498 true)
Error for train10 (tempo est= 61.2, users= 61.2 BPM, 20 true tracks): Overall error=  10.3% (  41 ins,   15 del,  574 true)
Error for train11 (tempo est=139.2, users=139.9 BPM, 26 true tracks): Overall error=  20.0% ( 202 ins,  133 del, 1725 true)
Error for train12 (tempo est=122.0, users= 54.1 BPM, 27 true tracks): Overall error= 161.4% ( 999 ins,   81 del,  675 true)
Error for train13 (tempo est=119.5, users=181.3 BPM, 30 true tracks): Overall error= 112.3% (1039 ins, 1810 del, 2541 true)
Error for train14 (tempo est=130.0, users=130.7 BPM, 29 true tracks): Overall error=  12.2% ( 126 ins,   94 del, 1824 true)
Error for train15 (tempo est=185.4, users= 62.0 BPM, 28 true tracks): Overall error= 207.2% (1709 ins,    4 del,  831 true)
Error for train16 (tempo est=182.9, users= 90.7 BPM, 20 true tracks): Overall error= 131.3% (1022 ins,   28 del,  806 true)
Error for train17 (tempo est= 93.1, users= 46.1 BPM, 24 true tracks): Overall error= 127.0% ( 611 ins,   66 del,  535 true)
Error for train18 (tempo est=129.6, users= 61.2 BPM, 34 true tracks): Overall error= 151.1% (1320 ins,  177 del,  996 true)
Error for train19 (tempo est= 93.5, users=188.1 BPM, 34 true tracks): Overall error=  63.5% ( 221 ins, 1730 del, 3073 true)
Error for train20 (tempo est=110.0, users=219.8 BPM, 38 true tracks): Overall error=  76.9% ( 560 ins, 2551 del, 4043 true)
Overall average error: 94.5% (13694 ins,  8596 del, 29357 true)

Acknowledgements

This material was developed for my course ELEN E4896 Music Signal Processing, under partial support from the NSF under project IIS-1117015.

2012-03-28 Dan Ellis dpwe@ee.columbia.edu