Objective measures of speech quality/SNR

This collection of Matlab functions calculates a set of objective speech quality measures, mostly focused around some version of SNR (i.e. speech energy to nonspeech energy ratio). The measures are:

NIST STNR - see http://labrosa.ee.columbia.edu/~dpwe/tmp/nist/doc/stnr.txt

WADA SNR - see http://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf

BSS_EVAL - see http://bass-db.gforge.inria.fr/bss_eval/

PESQ - see http://www.utdallas.edu/~loizou/speech/software.htm

SNR_VAD - the "extra" energy in regions designated as speech by some kind of voice activity detection (VAD) when compared to the energy of the "gaps" in-between.

Example use
Use with automatic VAD
Use with existing VAD
Use with clean reference
Use with specified time range
Bulk/batch usage
Memory issues
Command-line options
Installation
Releases
Acknowledgements

Example use

In the example below, we'll evaluate the speech quality for data with a range of different information avalable. In the simplest case, all we have is the noisy speech signal. We can still calculate STNR and WADA:

snreval('arabic_400mhz.wav');

#============= SNREVAL v0.54 (20140701) ===
# args: 
No VAD, but guessing is not selected
Target File: arabic_400mhz.wav
 time range: 0.0-60.0 s
NIST STNR = 23.8 dB
WADA SNR  = 19.6 dB
==========================================

Use with automatic VAD

The package includes a quick-and-dirty voice activity detection (VAD) algorithm based on the energy in the 100-1000 Hz region. If this is enabled, it can be used as a basis for SNR_VAD. This is controlled by a flag:

snreval('arabic_400mhz.wav','-guessvad',1);

#============= SNREVAL v0.54 (20140701) ===
# args: -guessvad 1 
Guessing VAD from noisy file arabic_400mhz.wav ...
Target File: arabic_400mhz.wav
 time range: 0.0-60.0 s
NIST STNR = 23.8 dB
WADA SNR  = 19.6 dB
   SNRvad = 8.1 dB
==========================================

Use with existing VAD

Alternatively, we can use existing VAD labels read from a text file. The file consists of lines of the format start_sec end_sec label, where label is ignored, and each start_sec end_sec pair designates a voice-active region:

snreval('arabic_400mhz.wav','-vad','arabic_400mhz-vad.txt');

#============= SNREVAL v0.54 (20140701) ===
# args: -vad arabic_400mhz-vad.txt 
Target File: arabic_400mhz.wav
 time range: 0.0-60.0 s
NIST STNR = 23.8 dB
WADA SNR  = 19.6 dB
   SNRvad = 12.3 dB
==========================================

Use with clean reference

If a clean reference signal is also avaliable, then PESQ and BSS_EVAL measures can also be calculated:

snreval('arabic_400mhz.wav','-vad','arabic_400mhz-vad.txt','-clean', ...
        'arabic_source.wav');

#============= SNREVAL v0.54 (20140701) ===
# args: -vad arabic_400mhz-vad.txt -clean arabic_source.wav 
Target File: arabic_400mhz.wav
 time range: 0.0-60.0 s
   Ref File: arabic_source.wav
 Targ delay: -0.237 s
NIST STNR = 23.8 dB
WADA SNR  = 19.6 dB
   SNRvad = 12.3 dB
      SAR = 9.1 dB
 PESQ MOS = 2.6
==========================================

Use with specified time range

Sometimes it is useful to be able to specify a time limit for the region to be analyzed in a file, for instance to exclude particularly bad noise regions. The '-start' and '-end' flags specify the start and end times of analysis, in seconds:

snreval('arabic_400mhz.wav','-clean','arabic_source.wav', '-start', ...
        8, '-end', 48);

#============= SNREVAL v0.54 (20140701) ===
# args: -clean arabic_source.wav -start 8 -end 48 
No VAD, but guessing is not selected
Target File: arabic_400mhz.wav
 time range: 8.0-48.0 s
   Ref File: arabic_source.wav
 Targ delay: -0.235 s
NIST STNR = 24.0 dB
WADA SNR  = 17.4 dB
      SAR = 13.7 dB
 PESQ MOS = 3.0
==========================================

Bulk/batch usage

snreval can be run over a whole list of files with the -listin 1 flag, which causes the noise file to be treated as a file containing a list of noisy file names, one per line. VAD and clean files for each one can be provided with -vaddir and -cleandir (if they have the same name stems and are all in one directory), or -vadlist and -cleanlist to provide corresponding list files giving individual names for each VAD and clean file. E.g.,

snreval('noisylist.txt','-listin',1,'-disp',0, ...
        '-cleanlist','cleanlist.txt','-samplerate',8000,'-end',300);

% There's also a -listout 1 flag to report results in a
% consistently-shaped, one file-per line output format, for easier
% machine processing:

snreval('noisylist.txt','-listin',1,'-listout',1, '-disp', 0, ...
        '-cleanlist','cleanlist.txt','-samplerate',8000,'-end',300);

% Values that are not calculated are reported as -999.

#============= SNREVAL v0.54 (20140701) ===
# args: -listin 1 -disp 0 -cleanlist cleanlist.txt -samplerate 8000 -end 300 
Target File: /u/drspeech/data/RATS/data/LDC2011E86_v2/data/train/rats-cts-alv/audio/a/20665_20110720_014200_10486_rats-cts-alv_A.flac
 time range: 0.0-300.0 s
   Ref File: /u/drspeech/data/RATS/data/LDC2011E86_v2/data/train/rats-cts-alv/audio/src/20110609_190355_2929_B_10486_rats-cts-alv_src.flac
 Targ delay: 2.345 s
NIST STNR = 13.2 dB
WADA SNR  = 5.3 dB
      SAR = -1.3 dB
 PESQ MOS = 1.9
==========================================

Error using ==> audioread at 30
audioread: file /u/drspeech/data/RATS/data/LDC2011E31/data/fsalv/audio/v30_v24/20110318_183605_0000_fsalv.v30_v24.flac not found

Error in ==> snreval at 161
  [dn,sr] = audioread(NOISY,SAMPLERATE,1,TS,DUR);

Error in ==> demo_snreval at 78
snreval('noisylist.txt','-listin',1,'-disp',0, ...

Memory issues

snreval loads the entire soundfiles (or specified portions) into memory at once and calculates spectrograms of the whole thing. For signals sampled at or downsampled to 8 kHz, a 300 s excerpt can comfortably be processed in about 1G of core. But you should avoid trying to load files much larger than that unless you want to watch your machine swap memory to disk for a long time.

Command-line options

The full list of flags recognized is given below. The first argument is always the name of the noisy file (or list with -listin), then...

-vad <vadfile> gives the name of the provided voice activity file
-clean <cleanfile> gives the name of a corresponding clean-speech file
-start <time_sec>
-end <time_sec> specify subsegment to process within noisy & clean
-guessvad 1 try to guess the VAD from CLEAN (or NOISY if no CLEAN).
-disp 0     don't do any graphics.
-listin 1   treat NOISY as a text file listing the actual files to process
-listout 1  write output values in columns instead of text report
-vaddir <dir>  directory containing VAD files named like noisy files
-vadlist <listfile> file containing list of VAD files instead
-cleandir <dir>  directory containing clean files named like noisy files
-cleanlist <listfile> file containing list of clean files instead
-ldclabels 1   treat VAD file as 8-column LDC format (instead of 3-col)
-samplerate <SR>  resample data to this SR before processing
-checkfshift 1    try SSB-style freq shift to match CLEAN to TARG

Installation

This package has been compiled for several targets using the Matlab compiler. You will also need to download and install the Matlab Compiler Runtime (MCR) Installer. Please see the table below:

Architecture Compiled package MCR Installer

64 bit Linux snreval_GLNXA64.zip Linux 64 bit MCR Installer

64 bit MacOS snreval_MACI64.zip MACI64 MCR Installer

Architecture	Compiled package	MCR Installer
64 bit Linux	snreval_GLNXA64.zip	Linux 64 bit MCR Installer
64 bit MacOS	snreval_MACI64.zip	MACI64 MCR Installer

There are more instructions on installing the command-line version in README.txt. The syntax of the command line version is essentially identical to the examples above, but without the parens, quotes, or commas.

The Matlab source can be downloaded in the following ZIP file: snreval.zip

Releases

% 2014-07-01 v0.54 * fixed problem with empty lists; improved
%                    finding binaries.
%
% 2013-10-02 v0.53 * added -preemph for pre-emphasizing, to affect
%                    SNR calcs.
%
% 2013-10-01 v0.52 * new version of audioread handles a-law wavs
%                  * added -hpf option to high-pass filter remove
%                    LF noise
%                  * added -my_stnr to avoid running nist binary
%                  * "fixed" help message to be the help message
%
% 2013-08-01 v0.51 * Updated to use latest version of audioread.
%
% 2012-01-03 v0.5  * Better error messages, version reporting
%                  * Uses new audioread, flacread
%                  * avoids LUT overflow in wada_snr??
%
% 2011-10-29 v0.4  * Folded eval_snr into snreval.
%                  * Added batch processing options (-listin/-listout).
%                  * Updated with newest SAR calculation from renoiser.
%                  * Added support for LDC-format S/NS/NT VAD files.
%
% 2011-08-02 v0.3  include nist_stnr_m to approximate NIST stnr if
%                  binary not available
%
% 2011-05-24 v0.2  modified guess_vad to ignore pure-zero frames in thresh.
%

Acknowledgements

This work was supported by the DARPA RATS program, team SCENIC. The PESQ calculation uses code by Philip Loizou of UT Dallas.

Last updated: $Date: 2011/08/04 01:35:05 $ Dan Ellis dpwe@ee.columbia.edu