MusicSeer Survey Data
Welcome! From this page, you can download the raw data gathered from the
MusicSeer similarity survey. The links
below are snapshots of the data at various points in time. Unless you want
the exact same data we used in the
ISMIR 2002 paper
, we recommend downloading
the most recent snapshot.
Date | Triplets | Size | Comments |
2002-10-15
|
224793 |
6178424 |
2002-04-29
|
138338 |
3753666 |
Used in the ISMIR paper. |
The data format
The data looks like this:
JudgmentID Survey/Game User Target Chosen Other
1 S QzTXJ 3802 2737 61
1 S QzTXJ 3802 2737 2564
2 S GBu2m 4325 4201 5612
2 S GBu2m 4325 4201 1886
2 S GBu2m 4325 4201 3140
2 S GBu2m 4325 4201 68
3 S GBu2m 2656 3572 855
3 S GBu2m 2656 3572 5523
Each line represents a triplet , which means that the
user judged the Chosen artist to be more similar to the Target
than the Other artist. In fact, the user is presented with a list of
(in the survey, ten) artists to select from. Hence, the JudgmentID column
is used to group together the several triplets that come from a single
user judgment. For example, in the snippet above, user GBu2m was presented with
the target 4325 and the list (4201, 5612, 1886, 3140, 68), and selected 4201 to be
most similar to the target. The Survey/Game column is 'S' if the judgment came
from the survey, 'G' for game.
Artist IDs
This file maps artist IDs that are used in the file
above to the band/artist name.
A word about the artists: The 413 artists used in the survey were chosen for a
particular reason (not just our bizarre personal tastes in music...). In
August 2001, we crawled opennap servers, collecting users' hotlists (the lists
of songs they were sharing). We used this data to evaluate recommendation
systems. The 413 artists in the MusicSeer survey were the most popular artists
on OpenNap at the time.
Please contact us with any questions:
music_sims@media.mit.edu
Dan Ellis
and Adam Berenzweig at Columbia Unversity,
Steve Lawrence at NEC
Research Institute, and Brian Whitman at the MIT Media Lab.
|