The musicseer.com web site was used to run a web-based survey to collect subjective judgments about artist similarity. We were interested in collecting some independent ground-truth to validate our automatic musical artist similarity metrics. The site ran from March until October 2002. We describe the basic results in our ISMIR-02 paper, The Quest for Ground Truth in Musical Artist Similarity; see also the slides I presented at the conference.
We made the results of this survey available for others to use; you can download them from the local copy of the musicseer.com results page. In particular, this is where you find topset_to_sqlid to map from the internal reference numbers to actual band names. (Unfortunately, these names are not canonicalized in the approved manner). (To map directly from the musicseer 'sqlid' reference numbers to indices into the aset400 list, you can use aset400.3-canon-musicseer.ids, a list of 400 sqlids corresponding to the aset400 artists. See the Matlab example on the metrics page.)
Note that musicseer.com is no longer controlled by us, and is currently run by one of those misdirected-web-page-scavenger organizations.
For interest, and for reference, here are some statistics regarding these data, specifically the musicseer-results-2002-10-15 dataset:
Game | Survey | Total | Notes | |
---|---|---|---|---|
Raw users | 680 | 713 | 1,032 | overlap between survey and game |
Raw judgments | 11,313 | 10,997 | 22,310 | |
Raw triplets | 114,508 | 98,964 | 213,472 | Click on the count to download the complete set of triplets (with duplicates removed) |
Filtered users | 602 | 541 | 842 | overlap |
Filtered judgments | 9,828 | 7,276 | 17,104 | |
Filtered triplets | 34,764 | 16,449 | 51,213 | Click on counts to download filtered lists in same format as main list |
Known artists/filtered user | 18.89 | 19.41 | 16.18 | knowledgeable users are more likely to be in both subsets? |
Known artists/filtered judgment | 5.54 | 4.26 | 4.99 | includes target and chosen i.e. #triples/#judgments+2 |
"Filtering" is removing all triplets in which the unchosen artist was not ever chosen by that particular user in a different trial - i.e. the cases in which we can't be sure that the user actually knew the unchosen artist, making the choice meaningful.
There are 426 unique artist IDs in this data... It should have been 412, but some extras crept in.
I encountered the following issues constructing these results: