LabROSA : Projects:

PITCHFLOW - direct calculation of delta-pitch

Pitch is helpful in speech recogntion, but most often it is not the absolute pitch that is used (since this varies a lot by speaker), but some locally-normalized pitch contour. In fact, it may be that only the pitch derivative is important. Several researchers have looked at calculating local pitch derivatives without having to go through the laborious and error-prone process of first finding a pitch contour. This code takes that approach.

The algorithm works by calculating a spectrogram of the speech, then warping the frequency axis to be logarithmic. In this projection, a change in pitch corresponds to a simple translation of the position of all harmonics. Thus, delta-pitch can be identified with simple cross-correlation between successive short-time spectra.

Contents

Example

% Load a sound file
[d,sr] = audioread(['/u/drspeech/data/swordfish/code/ehist/' ...
                    'BABEL_OP1_206_65882_20121201_174526_outLine.sph']);

% Plot its log-frequency spectrogram
subplot(311)
logfsgram(d,256,sr);
caxis([-30 30]);

% Calculate the normalized cross-correlation of adjacent log-f spectra
sxc = pitchflow(d, sr);

subplot(312)
tt = [0:(size(sxc,2)-1)]*0.010;
rr = [1:size(sxc,1)] - (size(sxc, 1)+1)/2;
imagesc(tt, rr, sxc); axis('xy');
grid

% Collapse each excerpt from the cross-corrletions into 3 features,
% the first three moments of the exponentiated NCC
dpf = pitchflow_collapse(sxc);

subplot(313)
plot(tt, dpf);

% Overplot the first moment on the xcorr, to show it tracks the
% main peak
subplot(312)
hold on;
plot(tt, dpf(2,:), '-w');
hold off

% Line up and zoom in
linkaxes([subplot(311),subplot(312),subplot(313)], 'x')
axis([196 200 -5 10])

Bulk calculation

Say you want to calculate this feature for a whole directory full of Babel utterances. Here's how:

corpus = 'BABEL_OP1_102_LLP';
babelcorproot = '/u/drspeech/data/swordfish/corpora';
wavdevdir = fullfile(babelcorproot, corpus, 'conversational/dev/audio');
wavtrndir = fullfile(babelcorproot, corpus, 'conversational/training/audio');
ftrdevdir = fullfile(corpus, 'dev/dpitch');
ftrtrndir = fullfile(corpus, 'training/dpitch');
mymkdir(ftrdevdir);
mymkdir(ftrtrndir);
pitchflow_processdir(wavtrndir, ftrtrndir);
pitchflow_processdir(wavdevdir, ftrdevdir);
ftr2devdir = fullfile(corpus, 'dev/dpflow');
ftr2trndir = fullfile(corpus, 'training/dpflow');
mymkdir(ftr2devdir);
mymkdir(ftr2trndir);
pitchflow_reprocessdir(ftrdevdir, ftr2devdir);
pitchflow_reprocessdir(ftrtrndir, ftr2trndir);
t_pad = 3.5 ms
Wrote BABEL_OP1_102_LLP/training/dpitch/BABEL_OP1_102_10713_20120401_204236_inLine.htk
t_pad = 3.5 ms
Wrote BABEL_OP1_102_LLP/training/dpitch/BABEL_OP1_102_10713_20120401_204236_outLine.htk
t_pad = 3.5 ms
Wrote BABEL_OP1_102_LLP/training/dpitch/BABEL_OP1_102_11031_20120926_231829_inLine.htk
t_pad = 3.5 ms
Wrote BABEL_OP1_102_LLP/training/dpitch/BABEL_OP1_102_11031_20120926_231829_outLine.htk
t_pad = 3.5 ms
Wrote BABEL_OP1_102_LLP/dev/dpitch/BABEL_OP1_102_10408_20121105_223454_inLine.htk
t_pad = 3.5 ms
Wrote BABEL_OP1_102_LLP/dev/dpitch/BABEL_OP1_102_10408_20121105_223454_outLine.htk
t_pad = 3.5 ms
Wrote BABEL_OP1_102_LLP/dev/dpitch/BABEL_OP1_102_10925_20120329_192327_inLine.htk
t_pad = 3.5 ms
Wrote BABEL_OP1_102_LLP/dev/dpitch/BABEL_OP1_102_10925_20120329_192327_outLine.htk
Wrote BABEL_OP1_102_LLP/dev/dpflow/BABEL_OP1_102_10408_20121105_223454_inLine.htk
Wrote BABEL_OP1_102_LLP/dev/dpflow/BABEL_OP1_102_10408_20121105_223454_outLine.htk
Wrote BABEL_OP1_102_LLP/dev/dpflow/BABEL_OP1_102_10925_20120329_192327_inLine.htk
Wrote BABEL_OP1_102_LLP/dev/dpflow/BABEL_OP1_102_10925_20120329_192327_outLine.htk
Wrote BABEL_OP1_102_LLP/training/dpflow/BABEL_OP1_102_10713_20120401_204236_inLine.htk
Wrote BABEL_OP1_102_LLP/training/dpflow/BABEL_OP1_102_10713_20120401_204236_outLine.htk
Wrote BABEL_OP1_102_LLP/training/dpflow/BABEL_OP1_102_11031_20120926_231829_inLine.htk
Wrote BABEL_OP1_102_LLP/training/dpflow/BABEL_OP1_102_11031_20120926_231829_outLine.htk

Python Port

The full pitchflow feature calculation pipeline has been ported to Python. See the pitchflow package on GitHub.

Changelog

% 2014-02-11 v0.1  Cleaned up and added full "matlab publish" output
%
% 2014-01-28 v0.0  Initial release
%

Acknowledgment

This work was supported by IARPA under the Babel program via a subcontract from the ICSI-led team Swordfish