Dan Ellis : Resources : Matlab :

# Spectrograms: Constant-Q (Log-frequency) and conventional (linear)

#### Introduction

The spectrogram is a standard sound visualization tool, showing the distribution of energy in both time and frequency. It is simply an image formed by the magnitude of the short-time Fourier transform, normally on a log-intensity axis (e.g. dB).

Matlab's Signal Processing Toolbox has a built-in specgram function, but to support students who had not purchased that toolbox, I wrote a drop-in replacement.

The return value of specgram is a complex array that contains enough information to fully reconstruct the original signal. I've written an inverse routine, ispecgram that will recreate sound from a (possibly modified) output of specgram.

For contrast, I've also written a function to plot spectrograms on a log-frequency axis. This isn't a very sophisticated version - it just performs a mapping of the linear-frequency bins from the FFT (rather than, say, varying the time window at different frequencies). But it shows the differences in this kind of display well enough.

I've included a second version of this log-frequency mapping, also known as constant-Q (i.e. the bandwidth-to-center-frequency ration is constant). This version attempts to preserve information enough information in the log-frequency domain to reconstruct the linear-frequency spectrogram with minimal distortion, to allow iterative mapping between the two spectral axes.

The code fragment below makes a simple comparison of linear and log-frequency spectrograms.

#### Code

The routines provided here are:

• myspecgram.m - drop-in replacement for the specgram function in Matlab's signal processing toolbox, for use by those who do not have the toolbox.
• ispecgram.m - converts the complex short-time Fourier transform array generated by specgram back into an audio waveform; takes the same parameters as specgram (except the first, of course).
• logfsgram.m - just like specgram, but using a logarithmic frequency axis so that sets of harmonics shift vertically rather than stretching as the fundamental changes.
• [M,N] = logfmap(I,L,H) - returns a matrix M that can be used to premultiply a linear-frequency spectrogram matrix (e.g. the output of myspecgram) to generate a log-frequency spectrogram, where the number of bins and their sampling is designed to avoid information loss (through blurring) in the log-frequency domain. It takes three arguments: the number of rows in the in the original linear-frequency spectrogram (I), and the lowest and highest bins to attempt to preserve in the log space (L and H). L must be larger than 1, since bin 1 in the linear frequency spectrogram corresponds to 0 Hz, which cannot be represented in the log-frequency domain. In general, H sets the resolution of each step (since the frequency ratio of bin H to bin H-1 sets the ratio of all adjacent bands in the log-F domain, so a higher gives a finer sampling), and as L gets smaller more and more octaves need to be included in the log-F domain, so the number of rows in the mapped data grows as log(L). Sacrificing a few bins at both frequency extremes leads to reasonable-size log-F mappings (see the example below). Also, trying to get too close to the 0 Hz bin leads to a more rapid build-up of artefacts in iterative processing. N is returned as the inverse matrix, so that N*M is approximately an identity transformation, at least within the range of bins L to H.

#### Example

An example use is shown below:

```>> % Load a speech waveform
>>
>> % Conventional (linear-frequency) spectrogram
>> subplot(311)
>> specgram(d,1024,sr);
>> % Log-frequency spectrogram
>> subplot(312)
>> logfsgram(d,1024,sr);
>> % Recover approx to lin-F from log-F
>> [Y,MX]=logfsgram(d,1024,sr);
>> DR = sqrt(MX'*(Y.^2));
>> subplot(313)
>> imagesc(20*log10(DR))
>> caxis([-100 30])
``` Notice how the bottom quarter of the lin-freq specgtrogram is expanded to almost all of the log-freq spectrogram, and how the sets of harmonic partials that are equally-spaced but stretching apart on the left become a pattern of unequally-spaced features moving in parallel on the right. Also notice how mapping the log-resolution spectrogram back to the lin-freq bins (with the MX mapping matrix returned by logfsgram) results in blurring in the higher frequency bins.

Here's an example of using the logfmap matrix:

```>> % Start with the basic (linear-freq) spectrogram matxix
>> D = log(abs(specgram(d,512)));
>> % We're going to do the mapping in the log-magnitude domain
>> % so let's shift D so that a value of zero means something.
>> minD = min(min(D));
>> D = D - minD;
>> subplot(311)
>> imagesc(D); axis xy
>> c = caxis;
>>
>> % Design the mapping matrix to lose no bins at the top but 5 at the bottom
>> [M,N] = logfmap(257,6,257);
>> size(M)
ans =
1006         257
>> % Our 257 bin FFT expands to 1006 log-F bins
>> % Perform the mapping:
>> MD = M*D;
>> subplot(312)
>> imagesc(MD); axis xy
>> caxis(c);
>> % Map back to the original axis space, just to check that we can
>> NMD = N*MD;
>> subplot(313)
>> imagesc(NMD); axis xy
>> caxis(c)
>> % Most bins look the same, except for the band that we lost at the bottom
``` 