User Tools

Site Tools


latentartists

====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

latentartists [2013/06/30 16:04]
ben created
latentartists [2013/06/30 17:09] (current)
ben
Line 1: Line 1:
-Latent Artists ​+====== ​Latent Artists ​====== 
 by Ben Swanson and Elif Yamangil by Ben Swanson and Elif Yamangil
  
  
-Dataset - +==== Dataset - ====
  
-Assume a fixed vocabulary $V$, which in our experiments is a company internal ​list+ 
 +Assume a fixed vocabulary $V$, which in our experiments is a list compiled by the Echonest
 of music related multiword terms. of music related multiword terms.
  
-Each item $x_i$, from an OO-programming point of view, has the following fields+Each item $d_i$ in our dataset $D$, from an OO-programming point of view, has the following fields
  
   * Artist Name   * Artist Name
   * Echonest ID   * Echonest ID
-  * Echonest Genres +  * Echonest Genres ​(used for qualitative evaluation) 
-  * Treating ​a sample of reviews for this artist as a bag of words $w \in V$+  * ML unigram model $x_i$, treating ​a sample of reviews for this artist as a bag of terms $w \in V$ 
 + 
 +$|V| = 3368$ 
 + 
 +$|D| = 23541$ 
 + 
 + 
 +==== Modeling Approach - ==== 
 + 
 +Using Factor Analysis, each $x_i$ as 
 + 
 +$z_i \sim \mathcal{N}(0,​\mathbf{I})$ 
 + 
 +$x_i \sim \mathcal{N}(Wz,​\Psi)$ 
 + 
 +==== Hypothesis - ==== 
 + 
 +Much work that discovers similarity through low-dimensional representations such as PCA or Neural Networks treat 
 +each data point as a single point in space. ​ By taking the Bayesian approach described above we can not only embed data in  
 +a low dimensional space but also quantify our uncertainty about each dimension. ​  
 + 
 +==== Method - ==== 
 + 
 +The above model can be used to predict similar artists based on distance in the latent space. ​ The traditional  
 +approach would be to represent artist $d_i$ with its posterior mean $\mathbb{E}[z_i]$,​ and measure Euclidian distance. 
 +Our alternative computes distance with KL divergence between full posteriors. ​ The posterior probability is given as 
 + 
 +$z_i \sim \mathcal{N}(\mathbb{E}[z_i],​G)$ 
 + 
 +where 
 + 
 +$G = (I + W^T\Psi^{-1}W)^{-1}$ 
 + 
 +Distance between can be computed with KL-divergence,​ which for Multivariate Gaussian'​s is given as 
 + 
 +$KL(\mathcal{N}_0||\mathcal{N}_1) \propto (\mathbb{E}[z_0] - \mathbb{E}[z_1])\Sigma^{-1}(\mathbb{E}[z_0] - \mathbb{E}[z_1])^T + C$ 
 + 
 +if the covariance matrix $\Sigma$ is the same for both Gaussians. ​ This shows that if $\Sigma^{-1}$ is a multiple 
 +of the identity matrix, the ranking retrieved will be the same as that of Euclidian distance between posterior means.  
 + 
 +We can calculate the artists that are similar to an arbitrary artist by calculating their distance to all other artists using one of these  
 +metrics and applying a threshold.  
 + 
 +==== Evaluation - ==== 
 + 
 +We evaluate prediction of similarity on the top 300 artists by Echonest "​hotttness",​ a set we will call $\mathcal{H}$. ​  
 +We use the official artists similars from the Echonest database for each artist as the ground truth, provided that these 
 +similar artists are also in $\mathcal{H}$. ​ By varying the threshold on KL divergence or Euclidian distance we can trace out 
 +an ROC curve. 
 + 
 +Our results, contained in the ROC plots below, correspond to training on the full dataset and only the top 1000 by hotttness. ​  
 +In both experimental setups the same top 300 artists are used for evaluation, the only difference is the amount of information available  
 +during training. 
 + 
 +== Hottt 1000 == 
 + 
 +{{::​1000.jpg?​600|}} 
 + 
 +== Full Dataset == 
 + 
 +{{::​full.jpg?​600|}} 
 + 
 +The results do not support our hypothesis that taking uncertainty into account would create a more robust notion of similarity.  
 +While both methods clearly capture the information in the Echonest artist similar lists, the area under the ROC curve is clearly 
 +greater for the simple Euclidean distance based approach. ​  
 + 
 +The reason that the experimental results do not match our intuition is unclear. ​ One possibility is that KL divergence 
 +is not an appropriate metric for similarity. ​  
 + 
latentartists.1372622649.txt.gz · Last modified: 2013/06/30 16:04 by ben