We mention some other sources of data related to Music Information Retrieval research. THIS IS NOT AN EXHAUSTIVE LIST! If you want your dataset to be included here, send me an email.
G. Tzanetakis maintains two datasets including the famous GZTAN which is small by today's standards. Magnatagatune is one of the largest dataset the provides audio and tag information. Also note the very important RWC music database. Other tagging / genre datasets include CAL500 and the latin music database.
Regarding recommendation, the new standard is probably Yahoo Music Ratings. Paul Lamere also maintains a 2007 crawl of some Last.fm data.
For metadata, nothing compares to musicbrainz.
For structure analysis, Chris Harte annotated the Beatles, check his paper and write him to obtain the data.
Resources that we are less familiar with but are worth checking out include SALAMI, Codaich, soundsoftware, MusiClef 2011 and OMRAS2. Don't forget the actual The Echo Nest API.
On a more general machine learning note, infochimps is an incredible source of data. Similar are UCI repository and mldata. Also, to compare algorithms, websites like MLcomp, TunedIT and kaggle are extremely useful.
- Login to post comments
