LabROSA : Projects:

pt_mkPseudoGt - Matlab pitch trackers and pseudo-ground-truth creation

pt_mkPseudoGt is a package consisting of several independendent pitch trackers:

1. WU & WANG: This is our own implementation of the algorithm published by Wu & Wang in IEEE TASLP in 2003. This taken from The Wu Pitch Tracking System, and is supposed to approximate the C-code made available by the original authors.

2. YIN: This is Alain De Cheveigne's YIN pitch tracker from .

3. SWIPE-PRIME: This is the pitch tracker written by Arthur Camacho, downloaded from .

4. GET_F0: This David Talkin's original pitch tracker, in C, but modified by us to read more audio file types, and to write output as a simple ascii file. The unmodified source is part of the ESPS source package at .

5. YAAPT: Zahorian and Hu's pitch tracker including fundamental restoration from .

pt_mkPseudoGt provides wrappers to access all these pitch trackers, as well as our SAcC pitch tracker, from a common interface. It also provides pt_eval.m to evaluate pitch trackers against ground truth using our Pitch Tracking Error (PTE) measure (which balances gross pitch errors and voicing errors), as well as the traditional GPE measure.

Finally, the package includes pt_mkPseudoGt.m which will run a number of pitch trackers over a collection of audio files, do majority voting to find a consensus pitch where they agree, and write a new "pseudo-ground-truth" pitch file, providing the majority pitch (or unvoiced) value where the pitch trackers agree, as well as marking the frames where no agreement is reached. These outputs can be used to train the SAcC pitch tracker on in-domain data where no ground truth is available.

This package relies on audioread to allow it to read any kind of soundfile.


Running one pitch tracker

Here, we run the Yin pitch tracker on a single example using our wrapper. Instead of ptrack_yin.m, we could have substituted ptrack_wu.m, ptrack_swipe.m, ptrack_getf0.m, or ptrack_sacc.m. Each of these functions provides a minimal, common interface to the underlying pitch tracker, returning the times of the samples, the f_0 estimates in Hz, and a posterior probability of voicing (between 0 and 1, which may be binary for pitch trackers that don't provide finer distinctions). Note that f_0 estimates will be 0 during frames judged to be unvoiced (pvx < 0.5).

% Load the audio file
infile = 'mpgr1_sx419.wav';
[d,sr] = wavread(infile);

% Run YIN on it
[ty,f0y,pvy] = ptrack_yin(d,sr);

% Plot results
plot(ty, f0y, ty, pvy*100)
axis([0 max(ty) 0 300])
legend('YIN pitch track','YIN prob(vx)');

Using pt_wrap to handle file IO and timebase resampling

pt_wrap.m is a wrapper that provides a bunch of common housekeeping functions applicable to any of the pitch trackers. It will accept a filename instead of a vector of time samples; it will write the pitch tracker output to file if given an output file name; it will accept lists of input and output filenames (in cell arrays) to process a set of files at once; at it will perform resampling to provide a consistent time base regardless of the underlying pitch tracker's output. pt_wrap.m acceptss the underlying pitch tracker as a function pointer.

% Yin actually uses a very fine time step of 2ms
% pt_wrap will run a pitch tracker and resample the output to a
% given frame rate, e.g. 10ms.
outfile = 'tmppt.txt';
dt = 0.010; % 10ms hop

% Run the pitch tracker via pt_wrap
[ty, f0y, pvy] = pt_wrap(infile, outfile, dt, @ptrack_yin);

% check the new frame rate
% results look the same
plot(ty, f0y, ty, pvy*100)
axis([0 max(ty) 0 300])
legend('YIN pitch track','YIN prob(vx)');
ans =


ans =


Making pseudo ground truth

pt_mkPseudoGt.m uses the functions and wrapper above to provide a way to run multiple pitch trackers over the same set of examples, storing results on a common time base. It will then go through all the pitch tracks for each file, and do majority voting on each frame; if a majority of the specified pitch trackers agree (to within 20%) on the pitch of a frame, or agree that it is unvoiced, their average will be taken as the consensus ground truth pitch for that frame. Where no consensus is reached, that frame will be marked with an f_0 of -1 and a prob(voicing) of -1 in the pitch track file. This pseudo-ground-truth will be saved and is then suitable for use as input to train a new SAcC classifier.

% List of audio files
wavlist = {'rl001','rl002','sb001','sb002'};
% directory where we can find them (from SAcC package)
wavdir = '../SAcC/audio';
% extension for audio files
wavext = '.wav';
% list of pitch trackers to use (appended to "ptrack_" to make
% function name)
pts = {'yin','wu','getf0','swipe','yaapt'};
% Can also specify as a comma-separated string (good for compiled versions)
%pts = 'yin,wu,getf0,swipe,yaapt';
% Root directory for writing pitch tracker outputs into
% (each pitch tracker will get its own subdirectory)
ptoutdir = 'ptk';
% Directory to write pseudo ground truth pitch tracking files into
pgtoutdir = 'ptk/pgt';

% Run all the pitch trackers, write outputs, calculate and save
% consensus
nagree = 3; % require 3 of 5 to agree
verb = 1;   % verbose progress
pt_mkPseudoGt(wavlist, wavdir, wavext, pgtoutdir, pts, ptoutdir, ...
             nagree, verb);

% spectrogram of first example
[d,sr] = audioread(fullfile(wavdir,[wavlist{1},wavext]));
% Now we have the pitch tracks for all trackers saved to disk, read
% them back and compare them for one example
c = 'bgrmcyk';  % colors to use
pts = {'yin','wu','getf0','swipe','yaapt'};
for i = 1:length(pts)
  % Read one of the pitch tracking files written by pt_mkPseudoGt
  [t,f0] = pt_read(fullfile(ptoutdir, pts{i}, ...
                            [wavlist{1},'-', pts{i},'.txt']));
  hold on
% add the pseudo ground truth
[t,f0] = pt_read(fullfile(pgtoutdir, [wavlist{1}, '-pgt.txt']));
valid = find(f0 >= 0);
% plot red dots to indicate frames without consensus ptrack
novalid = find(f0 < 0);
hold off
Running ptrack_yin on ../SAcC/audio/rl001.wav ...
Running ptrack_yin on ../SAcC/audio/rl002.wav ...
Running ptrack_yin on ../SAcC/audio/sb001.wav ...
Running ptrack_yin on ../SAcC/audio/sb002.wav ...
4 files (9.46 sec) processed in 0.5 sec = 0.051 x RT
Running ptrack_wu on ../SAcC/audio/rl001.wav ...
Running ptrack_wu on ../SAcC/audio/rl002.wav ...
Running ptrack_wu on ../SAcC/audio/sb001.wav ...
Running ptrack_wu on ../SAcC/audio/sb002.wav ...
4 files (9.38 sec) processed in 6.3 sec = 0.675 x RT
Running ptrack_getf0 on ../SAcC/audio/rl001.wav ...
Running ptrack_getf0 on ../SAcC/audio/rl002.wav ...
Running ptrack_getf0 on ../SAcC/audio/sb001.wav ...
Running ptrack_getf0 on ../SAcC/audio/sb002.wav ...
4 files (9.3 sec) processed in 0.7 sec = 0.077 x RT
Running ptrack_swipe on ../SAcC/audio/rl001.wav ...
Running ptrack_swipe on ../SAcC/audio/rl002.wav ...
Running ptrack_swipe on ../SAcC/audio/sb001.wav ...
Running ptrack_swipe on ../SAcC/audio/sb002.wav ...
4 files (9.5 sec) processed in 1.9 sec = 0.204 x RT
Running ptrack_yaapt on ../SAcC/audio/rl001.wav ...
Running ptrack_yaapt on ../SAcC/audio/rl002.wav ...
Running ptrack_yaapt on ../SAcC/audio/sb001.wav ...
Running ptrack_yaapt on ../SAcC/audio/sb002.wav ...
4 files (9.38 sec) processed in 3.1 sec = 0.328 x RT
wrote ptk/pgt/rl001-pgt.txt
wrote ptk/pgt/rl002-pgt.txt
wrote ptk/pgt/sb001-pgt.txt
wrote ptk/pgt/sb002-pgt.txt
makePseudoGt v0.23 of 20130824: 954 total frames, 947 (99.3%) have consensus (3 of 5)

Evaluating pitch trackers

We can use pt_eval.m to evaluate individual pitch trackers against some ground truth. pt_eval_multi.m will evaluate the outputs of several pitch trackers over the same set of utterances. In this case, we can use the pseudo ground truth, and see how well each individual pitch tracker agrees with this consensus. No-consensus frames (negative f0 values in the ground truth) are excluded from the evaluation.

pt_eval_multi(wavlist, pgtoutdir, '-pgt.txt', ptoutdir, pts);
% The GPEs (Gross Pitch Error) are very low because it only
% evaluates when both tracker and ground truth say a pitch is
% present, and for the consensus ground truth this is mostly just
% the "easy" frames.
Algo       yin: PTE= 8.6% (VE=11.8% (VPE= 0.0%, VFR=11.8%) UE= 5.3%) GPE= 0.0% 
Algo        wu: PTE= 8.9% (VE=15.9% (VPE= 0.0%, VFR=15.9%) UE= 1.9%) GPE= 0.0% 
Algo     getf0: PTE= 4.9% (VE= 2.0% (VPE= 1.1%, VFR= 1.0%) UE= 7.7%) GPE= 1.1% 
Algo     swipe: PTE= 4.4% (VE= 3.5% (VPE= 0.0%, VFR= 3.5%) UE= 5.4%) GPE= 0.0% 
Algo     yaapt: PTE= 3.3% (VE= 0.6% (VPE= 0.3%, VFR= 0.4%) UE= 6.0%) GPE= 0.3% 

Testing a single pitch tracker

We can use the same pieces to test a single pitch tracker, without precalculating or storing the results:

keeledir = '../../data/pitch/keele';
ids = listfileread(fullfile(keeledir, 'idlist.txt'));
audiodir = fullfile(keeledir, 'wav');
audioext = '.wav';
gtdir = fullfile(keeledir, 'ptk/gt');
gtext = '-gt.txt';
pt_test('sacc_new', ids, audiodir, audioext, gtdir, gtext);
creating /private/tmp/tpa3e06cf8_7609_4959_a3ab_68c1a0948e51 ... 
using my_autocorr
using my_autocorr
using my_autocorr
using my_autocorr
using my_autocorr
using my_autocorr
using my_autocorr
using my_autocorr
using my_autocorr
using my_autocorr
Algo  sacc_new: PTE=10.4% (VE=17.7% (VPE= 2.1%, VFR=15.7%) UE= 3.0%) GPE= 2.5% 

Compiled version

This package has been compiled for several targets, into a binary idlist audiodir audioext pgtdir ptlist ptdir minagree verbose rf0

where the arguments are:

idlist - file containing the list of file IDs to use.

audiodir, audioext - prefix and suffix to make the audio filenames from the IDs. Specifying audiodir as '' will prevent any pitch trackers actually being run, and instead pt_mkPseudoGt will attempt to create pseudo ground truth from already-calculated pitch track files in ptdir

pgtdir - directory into which pseudo ground truth pitch track files (named &lt;id&gt;-pgt.txt) are written.

ptlist - comma-separated list of pitch trackers to use, e.g. "wu,yin,getf0,yaapt,swipe"

ptdir - root of directory to write (and/or read) individual pitch tracking files. One subdirectory per pitch tracker will be created in this directory.

minagree - the actual number of pitch trackers that need to agree to have a given frame count as consensus with a recorded pitch value (defaults to ceil(n_pitch_trackers/2))

verbose - set to 1 to get diagnostics (default 0)

rf0 - set to 1 to include restoref0 processing (default 0)


The original Matlab code used to build this compiled target is available at

All sources and the parameter files are in the package

Since the compiled pt_mkPseudoGt binary was created with the Matlab compiler, you will also need to download and install the Matlab Compiler Runtime (MCR) Installer. Please see the table below:

ArchitectureCompiled packageMCR Installer
64 bit Linux
Linux 64 bit MCR Installer
64 bit MacOS
MACI64 MCR Installer

Feel free to contact me with any problems.


% 2013-08-24 v0.23 - added pt_test to test a single pitch tracker
%                    without saving the results.
% 2013-03-06 v0.22 - modified ptrack_yaapt to chop audio up into 60
%                    sec chunks with 2 sec overlaps, then splice
%                    resulting pitch tracks together.
% 2013-02-14 v0.21 - changed package name from ptrack to pt_mkPseudoGt;
%                    changed name of main function and binary to
%                    pt_mkPseudoGt too.
% 2013-02-11 v0.2  - added yaapt, added more informative feedback
%                    messages, added handling for arguments as
%                    command-line strings to make compiling easier;
%                    added pt_eval_multi.
% 2013-02-07 v0.1  - Initial release


This work was supported by DARPA under the RATS program via a subcontract from the SRI-led team SCENIC (on behalf of ICSI), and by IARPA under the BABEL program, via a subcontract from the ICSI-led team to Columbia.

$Header: /Users/drspeech/data/RATS/code/pt_mkPseudoGt/RCS/demo_pt_mkPseudoGt.m,v 1.1 2013/02/07 14:46:15 dpwe Exp dpwe $