Dan Ellis :
Music Similarity :
Art of the Mix Playlist Data Statistics
We took the user-contributed playlists from the Art of the Mix web site in January 2003 to act as data source for music similarity - e.g., songs that occur in the same playlist are in some sense related, although we are not saying they are the most similar.
This page provides some of our normalized data and some statistics. The data were collected and regularized by Adam Berenzweig and Brian Whitman.
Here is the data in processed form:
- aotm_list_ids.txt: all 29,164 playlists in numerical format. Each line defines a playlist in the form #num# artnum: songnum artnum: songnum ... where num is the playlist index (1-29164), artnum is the artist hash-code (from artist.hash, below) and songnum is the song hash-code (from song.hash, below).
- artist.hash: the 60,931 unique artist names encountered, each mapped to the 48,169 artist hash-codes used in the list_ids above. Multiple names map to a single ID in our attempt to regularize misspellings etc. Combination performances (e.g. "bjork feat thom yorke") are treated as individual entities and given distinct IDs.
- song.hash: the 218,271 unique song names encountered, mapped to 218,260 IDs. Each song title is prefixed by the artist ID, so songs with the same name by different artists are distinct.
- aotm_raw_artists.txt: The artist IDs for every track in each nonempty playlist (including repeated artists).
- aotm_artist_lists.txt: Each playlist reduced to a sorted list of unique artist IDs only.
- aotm_aset_lists.txt: The above artist list further filtered only to refer to the 400 artists from
aset400, our list of 400 popular artists used as a representative subset for experiments in music similarity.
- aset400-aotmhash.txt: The 400 lines from artist.hash specifying the aset400 artists.
Here are some statistics
||29,164 (201 are empty after parsing)
|Total unique songs
|Total artist names
|Distinct regularized artist names
|Average songs/nonempty playlist
|Average artists/nonempty playlist
|Number of playlists after filtering for "aset400" artists
|Average number of "aset400" artists per filtered playlist
Last updated: $Date: 2003/07/06 14:32:00 $
Dan Ellis <email@example.com>