User Tools

Site Tools


glad

**This is an old revision of the document!** ----

A PCRE internal error occured. This might be caused by a faulty plugin

====== Optimal Integration of Labels for Cal500 Dataset ====== | Authors | Dawen Liang | | Affiliation | LabROSA, Columbia University | | Code | [[https://github.com/dawenl/glad_cal500|Github Link]] | [[http://cosmal.ucsd.edu/cal/projects/AnnRet/|Cal500]] is a widely used dataset for music tagging. The tags it contains include instrumentation("Electric Guitar"), genre("Jazz"), emotion("Happy"), usage("For a Party"), etc. They were collected from human annotators and integrated by "majority voting". However, considering the expertise from different annotators and the difficulty of different pieces, we can come up with a better statistical model for optimal label integration, which would ideally infer the label, as well as the expertise of the annotators and the difficulty of the songs. This work is primarily based on [[http://mplab.ucsd.edu/~jake/OptimalLabeling.pdf|this paper]] in NIPS 2009. ===== - Model ===== ==== - Notation and Model specification ==== $i\in\{1,2,\cdots,I\}$ is used to index annotators and $j\in\{1,2,\cdots,J\}$ is used to index music pieces. $L_{ij}$ represent the label collected from annotator $i$ on music $j$, while $Z_{j}$ stands for the "true" label of the corresponding music. For each annotator $i$, $\alpha_i \in (-\infty, +\infty)$ is used to indicate his/her expertise. $\alpha_i = +\infty$ means the annotator can always make the correct labels while $\alpha_i = -\infty$ means the annotator can always make the **opposite** label (maybe intentionally). $\alpha_i = 0$ means the label from the annotator doesn't carry any information. For each music piece $j$, $1/\beta_j \in [0, \infty)$ is used to indicate the difficulty of annotating it correctly, i.e. the larger $\beta_j$ is, the easier to annotate this piece correctly. Now we write the probability that annotator $i$ correctly label piece $j$ as: $P(L_{ij} = Z_j | \alpha_i, \beta_j) = \sigma(\alpha_i \beta_j)$ where $\sigma(\cdot)$ is logistic function which is shown below: {{::600px-logistic-curve_svg.png?200|}} From the shape of logistic function, we can see that if the annotator is good at making correct annotation (larger $\alpha_i$), given the same piece (fixed $\beta_j$), it has higher probability to make the right label. However, if the piece is difficult to label correctly ($\beta_j$ close to 0), it will bend the probability for every annotation to close to 0.5. ==== - Inference ==== This model can be fit by the classic [[http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm|expectation-maximization (EM) algorithm]]. To put it simple: - Do the following until convergence: - E-step: Treat $Z_j$ as latent variable and "guess" its value. - M-step: Optimize $\alpha_i$ and $\beta_j$ based on the guessing of $Z_j$ from E-step. ===== - Preliminary results ===== After fitting the model to Cal500, for each label, we can obtain $I$ different $\alpha_i$ corresponding to the expertise of $I$ annotators and we can take the average to obtain an "average" expertise for the given label. Here I fit the model to instrument-based labels and genre-based labels as they are simple and easy to understand. ==== - Solo v.s. Instrument ==== One thing interesting is how the annotators are good at labeling "Solo", as opposed to just labeling "Instrument" (as background). {{:comp.png?200|}} The histogram above shows both the distribution of average expertise of labeling instrument as background and as solo. We can that there is a huge gap, indicating the annotators are way better at annotating solo. ==== - Difficulty of different instruments ==== We can average the expertise based on instrument to see the difficulty for labeling, from the annotator's point of view. Below is the top 5 simplest v.s. the top 5 hardest: ^ Top 5 simplest ^ Top 5 hardest ^ | Ambient sounds | Drum set | | Harmonica | Male Lead Vocal | | Saxophone | Electric Guitar (Clean) | | Horn | Tambourine | | Violin | Sequencer | ==== - Genre ====

glad.1372644038.txt.gz ยท Last modified: 2013/06/30 22:00 by dawenl