Dan Ellis : Music Content Analysis : Practical :

A Practical Investigation of Singing Detection:
6. Final Evaluation

This secret page contains the second evaluation data set, and an example of a (slightly?) better classifier evaluated with it.

Here's how those values were calculated:


>> % Calculate features
>> [d,sr] = wavread('dont_you_want_me-human_league.wav');
>> length(d)/sr
ans =
  239.7516
>> tt = 0.02:0.02:239.7416;
>> size(tt)
ans =
           1       11987
>> cc = mfcc(d,sr,1/0.020);
>> cc = [cc; deltas(cc); deltas(deltas(cc,5),5)];
>> size(cc)
ans =
       39          11987
>> % Construct sampled labels
>> [stt,dur,lab] = textread(['dont_you_want_me-human_league.lab'], '%f %f %s','commentstyle','shell');
>> ll = zeros(length(lab),1);
>> ll(strmatch('vox',lab)) = 1;
>> lsamp = labsamplabs(tt,[stt,dur],ll)';
>> save evaldata.mat cc lsamp tt

Here's an example of a slightly better classifier, evaluated on this data:


>> % Train GMMs using the first 4 MFCCs, plus their deltas and d-deltas (12 dims), use 20 mix components
>> gmS = gmm(12,20,'diag');
>> gmS = gmminit(gmS, ddS(:,[[1:4] [14:17] [27:30]]), options);
Warning: Maximum number of iterations has been exceeded
>> gmS = gmmem(gmS, ddS(:,[[1:4] [14:17] [27:30]]), options);
Warning: Maximum number of iterations has been exceeded
>> gmM = gmm(12,20,'diag');
>> gmM = gmminit(gmM, ddM(:,[[1:4] [14:17] [27:30]]), options);
Warning: Maximum number of iterations has been exceeded
>> gmM = gmmem(gmM, ddM(:,[[1:4] [14:17] [27:30]]), options);
Warning: Maximum number of iterations has been exceeded
>> % Calculate log l/hood ratio over test data
>> LRa = log(gmmprob(gmS,cc([[1:4] [14:17] [27:30]],:)')./gmmprob(gmM,cc([[1:4] [14:17] [27:30]],:)'));
>> % Don't mess with trying to optimize threshold - use default
>> mean((LRa>0)==lsamp)
ans =
    0.6839
>> % Smoothing will help
>> LRaS = conv(hanning(51)/sum(hanning(51)), LRa);
>> LRaS = LRaS(25 + [1:length(LRa)]);
>> mean((LRaS>0)==lsamp)
ans =
    0.7310
>> % See what it looks like
>> subplot(311)
>> specgram(resample(d,800,22050),256,8000)
>> caxis([-55 25])
>> title('Dont you want me - Human League')
>> subplot(312)
>> plot(tt,lsamp)
>> axis([0 239.7 -.2 1.2])
>> title('Ground truth labels');
>> subplot(313)
>> plot(tt,LRaS,tt,(LRaS>0),'-r')
>> axis([0 239.7 -2.5 2.5])
>> title('cep4+d+dd model, default thresh')
>> % Really looks like we want a threshold *above* zero
>> % Based on dev data, maybe tracking mean of l/hood ratio is a good idea
>> % (but completely unprincipled).  Exclude final tail-off
>> thr = mean(LRaS(100:(length(LRaS)-1000)));
>> mean((LRaS>thr)==lsamp)
ans =
    0.7667
>> % OK, a little better
[Results on final eval data]
Back: Temporal Smoothing Top Next:

Last updated: $Date: 2003/07/02 15:40:09 $

Dan Ellis <dpwe@ee.columbia.edu>