HAMR 2013 Proceedings

====== Differences ====== This shows you the differences between two versions of the page.

--- latentartists [2013/06/30 16:07]
ben
+++ latentartists [2013/06/30 17:09] (current)
ben
@@ Line 4: / Line 4: @@
-Dataset -
+==== Dataset - ====
-Assume a fixed vocabulary $V$, which in our experiments is a company internal list
+Assume a fixed vocabulary $V$, which in our experiments is a list compiled by the Echonest
 of music related multiword terms.
-Each item $d_i$, from an OO-programming point of view, has the following fields
+Each item $d_i$ in our dataset $D$, from an OO-programming point of view, has the following fields
   * Artist Name
   * Echonest ID
-  * Echonest Genres
+  * Echonest Genres (used for qualitative evaluation)
   * ML unigram model $x_i$, treating a sample of reviews for this artist as a bag of terms $w \in V$
-Approach -
+$|V| = 3368$
+$|D| = 23541$
+==== Modeling Approach - ====
 Using Factor Analysis, each $x_i$ as
@@ Line 22: / Line 28: @@
 $z_i \sim \mathcal{N}(0,\mathbf{I})$
+$x_i \sim \mathcal{N}(Wz,\Psi)$
+==== Hypothesis - ====
+Much work that discovers similarity through low-dimensional representations such as PCA or Neural Networks treat
+each data point as a single point in space.  By taking the Bayesian approach described above we can not only embed data in
+a low dimensional space but also quantify our uncertainty about each dimension.
+==== Method - ====
+The above model can be used to predict similar artists based on distance in the latent space.  The traditional
+approach would be to represent artist $d_i$ with its posterior mean $\mathbb{E}[z_i]$, and measure Euclidian distance.
+Our alternative computes distance with KL divergence between full posteriors.  The posterior probability is given as
+$z_i \sim \mathcal{N}(\mathbb{E}[z_i],G)$
+where
+$G = (I + W^T\Psi^{-1}W)^{-1}$
+Distance between can be computed with KL-divergence, which for Multivariate Gaussian's is given as
+$KL(\mathcal{N}_0||\mathcal{N}_1) \propto (\mathbb{E}[z_0] - \mathbb{E}[z_1])\Sigma^{-1}(\mathbb{E}[z_0] - \mathbb{E}[z_1])^T + C$
+if the covariance matrix $\Sigma$ is the same for both Gaussians.  This shows that if $\Sigma^{-1}$ is a multiple
+of the identity matrix, the ranking retrieved will be the same as that of Euclidian distance between posterior means.
+We can calculate the artists that are similar to an arbitrary artist by calculating their distance to all other artists using one of these
+metrics and applying a threshold.
+==== Evaluation - ====
+We evaluate prediction of similarity on the top 300 artists by Echonest "hotttness", a set we will call $\mathcal{H}$.
+We use the official artists similars from the Echonest database for each artist as the ground truth, provided that these
+similar artists are also in $\mathcal{H}$.  By varying the threshold on KL divergence or Euclidian distance we can trace out
+an ROC curve.
+Our results, contained in the ROC plots below, correspond to training on the full dataset and only the top 1000 by hotttness.
+In both experimental setups the same top 300 artists are used for evaluation, the only difference is the amount of information available
+during training.
+== Hottt 1000 ==
+{{::1000.jpg?600|}}
+== Full Dataset ==
+{{::full.jpg?600|}}
+The results do not support our hypothesis that taking uncertainty into account would create a more robust notion of similarity.
+While both methods clearly capture the information in the Echonest artist similar lists, the area under the ROC curve is clearly
+greater for the simple Euclidean distance based approach.
+The reason that the experimental results do not match our intuition is unclear.  One possibility is that KL divergence
+is not an appropriate metric for similarity.

HAMR 2013 Proceedings

User Tools

Site Tools

Page Tools