Disentangled Multidimensional Metric Learning For Music Similarity

Dim-Sim Dataset

The dim-sim dataset is a collection of user-annotated music similarity triplet ratings used to evaluate music similarity search and related algorithms. Our similarity ratings are linked to the Million Song Dataset (MSD).

About

To collect our data, we randomly sampled 4,000 3-second triplets (i.e., anchor, song 1, song 2) from the MSD and asked people to annotate which track sounded more similar to the anchor (i.e., song 1 or song 2). Each triplet was annotated by 5-12 people, resulting in 39,440 raw human annotations. We then calculated the annotator agreement per triplet, defined as the ratio between the majority vote and total number of annotations, and filtered out triplets where the agreement was below 0.9, to create 879 high-agreement cleaned, human-annotated triplets. We have released both the raw and clean versions of the dataset in multiple formats discussed below.

Download

The dataset can be downloaded from the zenodo link: https://zenodo.org/record/3889149#.XuovcxMzbyV.

Formats

We have released both CSV and JSON versions of the data for both the raw (raw-dim-sim) and clean (clean-dim-sim) annotations as described above. For a given triplet rating, the following data is provided:

triplet_id    
                    anchor_id    
                    anchor_start_seconds    
                    anchor_start_samples    
                    song1_id    
                    song1_start_seconds    
                    song1_start_samples    
                    song2_id    
                    song2_start_seconds    
                    song2_start_samples    
                    sampling_rate    
                    clip_lengths_seconds    
                    clip_lengths_samples    
                    song1_vote    
                    song2_vote

For the raw versions, song1_vote and song2_vote correspond to the total number of users that voted for the each song respectively. For the clean versions, the values of song1_vote and song2_vote are set to 0 or 1. All clips used were exactly 3 seconds long. The triplet_id, song1_id, and song2_id denote the corresponding MSD track ID.

License

The dim-sim dataset is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

Acknowledgement

When dim-sim is used for academic research, we would highly appreciate it if scientific publications of works partly based on the dim-sim dataset cite the following publication:

@inproceedings{Lee2019MusicSimilarity,
                     title={Disentangled Multidimensional Metric Learning For Music Similarity},
                     author={Lee, Jongpil and Bryan, Nicholas J. and Salamon, Justin and Jin, Zeyu, and Nam, Juhan},
                     booktitle={Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
                     year={2020},
                     organization={IEEE}
                    }

Paper

Abstract

Read the Full Paper (ICASSP) (arXiv)