Beat-aligned chromas

We explain how to get beat-aligned features from The Echo Nest Data. This representation is often used for tasks such as cover recognition, as the beat is the most natural time step in music.

The actual code exists here, we will focus more on the concepts that make The Echo Nest analysis.

The analysis is made on the segment level. Intuitively, segments are musical events, mostly note onsets. For each segment, we have its time, the chroma vector at that moment, and also the timbre and loudness.

A beat-tracking has been run on the song. Therefore, we have the information to rescale the chroma vectors (on the segment level) to the beat level.

Below is an example, where we assume chroma vector is of size 1 (instead of 12) for sake of clarity. We have the following pairs (segment start start times in seconds, chroma value):

(.2, 1.0) (.3, 1.0) (.35, 1.2) (.5, 1.3)

And we assume that the beat start times are:

.2 and .35

We expect beat aligned-features to look like:

(.2, 1.0)  (.35, 1.25)

In iPython, the following lines extract beat-aligned chromas from a given HDF5 song file.

In [5]: import beat_aligned_feats
In [6]: btchromas = beat_aligned_feats.get_btchromas('TRAXLZU12903D05F94.h5')
In [7]: btchromas.shape
Out[7]: (12, 397)

Below you can see that the original number of segments was 935, and indeed there is 397 beats.

In [8]: h5 = hdf5_getters.open_h5_file_read('TRAXLZU12903D05F94.h5')
In [9]: hdf5_getters.get_segments_pitches(h5).shape
Out[9]: (935, 12)
In [10]: hdf5_getters.get_beats_start(h5).shape
Out[10]: (397,)
In [11]: h5.close()