We could not put a million files in one folder. Even a few thousand files in one directory can slow disk accesses significantly. We based the directory structure on The Echo Nest track IDs which are a kind of hash code. Echo Nest track IDs always take the form TR+LETTERS+LETTERS&NUMBERS. The directory path within the Million Song Dataset is the 3rd, 4th, 5th letters from the track ID, with the file itself is named after its track ID + the extension ".h5". For example, MillionSong/data/A/D/H/TRADHRX12903CD3866.h5.
- Login to post comments
