For our experiments in music similarity and retrieval, we have been working with a collection of artists. The goal was to have enough, similar artists so that discriminating artists was realistically difficult, and that looking for artists similar to a given target would have a number of interesting candidates. Also, because we were developing this set to obtain subjective ground-truth by direct solicitation, we wanted the artists to be familiar to current pop music listeners.
We based our artist list on our trawl of user collections visible over the OpenNap file-sharing network. Based on the data collected in June 2001, we took the 400 most highly represented artists, and took this to define our artist subset.
Actually, we took the top 414 artists, but that list turned out to have two duplicates due to misspellings that had not been caught, leaving the 412 artists that were used in our ISMIR-2002 paper. When we started looking at Art of the Mix playlist data, we further restricted the list to artists with good representation in that set, leaving the 400 artists we currently use, as listed in aset400.txt.
These artist names are normalized according to our name normalization conventions.