Thisismyjam was a social music website which allowed users to post their current favorite song, or "jam". Users could follow other users and "like" other users' jams. On September 26, 2015, thisismyjam was made read-only, and shortly after, a dump of its data was made available for research purposes. To facilitate experiments with this data, we matched it against the Million Song Dataset!
To perform the matching, we used the Python search engine library Whoosh to do a fuzzy string match of the user-supplied artist and title from every jam against every artist and song title from the MSD. This provides some tolerance over using a strict string match, but also introduces some false positives. In particular, it results in remixes and live versions being matched to "original" versions. If you care about omitting those, you can use this fuzzy matching as a starting point and manually filter out everything that doesn't satisfy a strict string match. The code for performing this matching can be seen here, which will take a few days to run on a reasonably powerful server.
Here's the resulting tab-separated value file: jam_to_msd.tsv. Each line of the file is in the format
where "jam_id" is one of the unique identifiers assigned to each jam and "msd_track_id" is a track ID, e.g. TRUTYZK128F42482B5. In total, 533,266 unique jams were matched to 130,239 unique MSD entries, for a total of 1,026,210 matches.
If you use this data, please cite the following report:
Andreas Jansson, Colin Raffel, and Tillman Weyde. "This is my Jam -- Data Dump", in 16th International Society for Music Information Retrieval Conference Late Breaking and Demo Papers, 2015.