User Tools

Site Tools


latentartists

====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

latentartists [2013/06/30 16:07]
ben
latentartists [2013/06/30 17:09] (current)
ben
Line 4: Line 4:
  
  
-Dataset - +==== Dataset - ====
  
-Assume a fixed vocabulary $V$, which in our experiments is a company internal ​list+ 
 +Assume a fixed vocabulary $V$, which in our experiments is a list compiled by the Echonest
 of music related multiword terms. of music related multiword terms.
  
-Each item $d_i$, from an OO-programming point of view, has the following fields+Each item $d_i$ in our dataset $D$, from an OO-programming point of view, has the following fields
  
   * Artist Name   * Artist Name
   * Echonest ID   * Echonest ID
-  * Echonest Genres+  * Echonest Genres ​(used for qualitative evaluation)
   * ML unigram model $x_i$, treating a sample of reviews for this artist as a bag of terms $w \in V$   * ML unigram model $x_i$, treating a sample of reviews for this artist as a bag of terms $w \in V$
  
-Approach -+$|V| = 3368$ 
 + 
 +$|D| = 23541$ 
 + 
 + 
 +==== Modeling ​Approach - ====
  
 Using Factor Analysis, each $x_i$ as Using Factor Analysis, each $x_i$ as
Line 22: Line 28:
 $z_i \sim \mathcal{N}(0,​\mathbf{I})$ $z_i \sim \mathcal{N}(0,​\mathbf{I})$
  
 +$x_i \sim \mathcal{N}(Wz,​\Psi)$
 +
 +==== Hypothesis - ====
 +
 +Much work that discovers similarity through low-dimensional representations such as PCA or Neural Networks treat
 +each data point as a single point in space. ​ By taking the Bayesian approach described above we can not only embed data in 
 +a low dimensional space but also quantify our uncertainty about each dimension.  ​
 +
 +==== Method - ====
 +
 +The above model can be used to predict similar artists based on distance in the latent space. ​ The traditional ​
 +approach would be to represent artist $d_i$ with its posterior mean $\mathbb{E}[z_i]$,​ and measure Euclidian distance.
 +Our alternative computes distance with KL divergence between full posteriors. ​ The posterior probability is given as
 +
 +$z_i \sim \mathcal{N}(\mathbb{E}[z_i],​G)$
 +
 +where
 +
 +$G = (I + W^T\Psi^{-1}W)^{-1}$
 +
 +Distance between can be computed with KL-divergence,​ which for Multivariate Gaussian'​s is given as
 +
 +$KL(\mathcal{N}_0||\mathcal{N}_1) \propto (\mathbb{E}[z_0] - \mathbb{E}[z_1])\Sigma^{-1}(\mathbb{E}[z_0] - \mathbb{E}[z_1])^T + C$
 +
 +if the covariance matrix $\Sigma$ is the same for both Gaussians. ​ This shows that if $\Sigma^{-1}$ is a multiple
 +of the identity matrix, the ranking retrieved will be the same as that of Euclidian distance between posterior means. ​
 +
 +We can calculate the artists that are similar to an arbitrary artist by calculating their distance to all other artists using one of these 
 +metrics and applying a threshold. ​
 +
 +==== Evaluation - ====
 +
 +We evaluate prediction of similarity on the top 300 artists by Echonest "​hotttness",​ a set we will call $\mathcal{H}$.  ​
 +We use the official artists similars from the Echonest database for each artist as the ground truth, provided that these
 +similar artists are also in $\mathcal{H}$. ​ By varying the threshold on KL divergence or Euclidian distance we can trace out
 +an ROC curve.
 +
 +Our results, contained in the ROC plots below, correspond to training on the full dataset and only the top 1000 by hotttness.  ​
 +In both experimental setups the same top 300 artists are used for evaluation, the only difference is the amount of information available ​
 +during training.
 +
 +== Hottt 1000 ==
 +
 +{{::​1000.jpg?​600|}}
 +
 +== Full Dataset ==
 +
 +{{::​full.jpg?​600|}}
  
 +The results do not support our hypothesis that taking uncertainty into account would create a more robust notion of similarity. ​
 +While both methods clearly capture the information in the Echonest artist similar lists, the area under the ROC curve is clearly
 +greater for the simple Euclidean distance based approach.  ​
  
 +The reason that the experimental results do not match our intuition is unclear. ​ One possibility is that KL divergence
 +is not an appropriate metric for similarity.  ​
  
  
latentartists.1372622872.txt.gz · Last modified: 2013/06/30 16:07 by ben