MusicSeer Survey Data

Welcome! From this page, you can download the raw data gathered from the MusicSeer similarity survey. The links below are snapshots of the data at various points in time. Unless you want the exact same data we used in the ISMIR 2002 paper , we recommend downloading the most recent snapshot.

DateTripletsSizeComments
2002-10-15 224793 6178424
2002-04-29 138338 3753666 Used in the ISMIR paper.

The data format

The data looks like this:
JudgmentID Survey/Game User Target Chosen Other
1 S QzTXJ 3802 2737 61
1 S QzTXJ 3802 2737 2564
2 S GBu2m 4325 4201 5612
2 S GBu2m 4325 4201 1886
2 S GBu2m 4325 4201 3140
2 S GBu2m 4325 4201 68
3 S GBu2m 2656 3572 855
3 S GBu2m 2656 3572 5523
Each line represents a triplet , which means that the user judged the Chosen artist to be more similar to the Target than the Other artist. In fact, the user is presented with a list of (in the survey, ten) artists to select from. Hence, the JudgmentID column is used to group together the several triplets that come from a single user judgment. For example, in the snippet above, user GBu2m was presented with the target 4325 and the list (4201, 5612, 1886, 3140, 68), and selected 4201 to be most similar to the target. The Survey/Game column is 'S' if the judgment came from the survey, 'G' for game.

Artist IDs

This file maps artist IDs that are used in the file above to the band/artist name.

A word about the artists: The 413 artists used in the survey were chosen for a particular reason (not just our bizarre personal tastes in music...). In August 2001, we crawled opennap servers, collecting users' hotlists (the lists of songs they were sharing). We used this data to evaluate recommendation systems. The 413 artists in the MusicSeer survey were the most popular artists on OpenNap at the time.


Please contact us with any questions: music_sims@media.mit.edu

Dan Ellis and Adam Berenzweig at Columbia Unversity, Steve Lawrence at NEC Research Institute, and Brian Whitman at the MIT Media Lab.