This secret page contains the second evaluation data set, and an example of a (slightly?) better classifier evaluated with it.
Here's how those values were calculated:
>> % Calculate features >> [d,sr] = wavread('dont_you_want_me-human_league.wav'); >> length(d)/sr ans = 239.7516 >> tt = 0.02:0.02:239.7416; >> size(tt) ans = 1 11987 >> cc = mfcc(d,sr,1/0.020); >> cc = [cc; deltas(cc); deltas(deltas(cc,5),5)]; >> size(cc) ans = 39 11987 >> % Construct sampled labels >> [stt,dur,lab] = textread(['dont_you_want_me-human_league.lab'], '%f %f %s','commentstyle','shell'); >> ll = zeros(length(lab),1); >> ll(strmatch('vox',lab)) = 1; >> lsamp = labsamplabs(tt,[stt,dur],ll)'; >> save evaldata.mat cc lsamp tt
Here's an example of a slightly better classifier, evaluated on this data:
>> % Train GMMs using the first 4 MFCCs, plus their deltas and d-deltas (12 dims), use 20 mix components >> gmS = gmm(12,20,'diag'); >> gmS = gmminit(gmS, ddS(:,[[1:4] [14:17] [27:30]]), options); Warning: Maximum number of iterations has been exceeded >> gmS = gmmem(gmS, ddS(:,[[1:4] [14:17] [27:30]]), options); Warning: Maximum number of iterations has been exceeded >> gmM = gmm(12,20,'diag'); >> gmM = gmminit(gmM, ddM(:,[[1:4] [14:17] [27:30]]), options); Warning: Maximum number of iterations has been exceeded >> gmM = gmmem(gmM, ddM(:,[[1:4] [14:17] [27:30]]), options); Warning: Maximum number of iterations has been exceeded >> % Calculate log l/hood ratio over test data >> LRa = log(gmmprob(gmS,cc([[1:4] [14:17] [27:30]],:)')./gmmprob(gmM,cc([[1:4] [14:17] [27:30]],:)')); >> % Don't mess with trying to optimize threshold - use default >> mean((LRa>0)==lsamp) ans = 0.6839 >> % Smoothing will help >> LRaS = conv(hanning(51)/sum(hanning(51)), LRa); >> LRaS = LRaS(25 + [1:length(LRa)]); >> mean((LRaS>0)==lsamp) ans = 0.7310 >> % See what it looks like >> subplot(311) >> specgram(resample(d,800,22050),256,8000) >> caxis([-55 25]) >> title('Dont you want me - Human League') >> subplot(312) >> plot(tt,lsamp) >> axis([0 239.7 -.2 1.2]) >> title('Ground truth labels'); >> subplot(313) >> plot(tt,LRaS,tt,(LRaS>0),'-r') >> axis([0 239.7 -2.5 2.5]) >> title('cep4+d+dd model, default thresh') >> % Really looks like we want a threshold *above* zero >> % Based on dev data, maybe tracking mean of l/hood ratio is a good idea >> % (but completely unprincipled). Exclude final tail-off >> thr = mean(LRaS(100:(length(LRaS)-1000))); >> mean((LRaS>thr)==lsamp) ans = 0.7667 >> % OK, a little better
Back: Temporal Smoothing | Top | Next: |