User Tools

Site Tools


latentartists

**This is an old revision of the document!** ----

A PCRE internal error occured. This might be caused by a faulty plugin

====== Latent Artists ====== by Ben Swanson and Elif Yamangil Dataset - Assume a fixed vocabulary $V$, which in our experiments is a company internal list of music related multiword terms. Each item $d_i$ in our dataset $D$, from an OO-programming point of view, has the following fields * Artist Name * Echonest ID * Echonest Genres (used for qualitative evaluation) * ML unigram model $x_i$, treating a sample of reviews for this artist as a bag of terms $w \in V$ $|V| = 3368$ $|D| = 23541$ Modeling Approach - Using Factor Analysis, each $x_i$ as $z_i \sim \mathcal{N}(0,\mathbf{I})$ $x_i \sim \mathcal{N}(Wz,\Psi)$ Hypothesis - Much work that discovers similarity through low-dimensional representations such as PCA or Neural Networks treat each data point as a single point in space. By taking the Bayesian approach described above we can not only embed data in a low dimensional space but also quantify our uncertainty about each dimension. Method - The above model can be used to predict similar artists based on distance in the latent space. The traditional approach would be to represent artist $d_i$ with its posterior mean $\mathbb{E}[z_i]$, and measure Euclidian distance. Our alternative computes distance with KL divergence between full posteriors. The posterior probability is given as $z_i \sim \mathcal{N}(\mathbb{E}[z_i],G)$ where $G = (I + W^T\Psi^{-1}W)^{-1}$ Evaluation - We evaluate prediction of similarity on the top 300 artists by Echonest "hotttness", a set we will call $\mathcal{H}$. We use the official artists similars from the Echonest database for each artist as the ground truth, provided that these similar artists are also in $\mathcal{H}$. By varying the numeric thesh

latentartists.1372623808.txt.gz ยท Last modified: 2013/06/30 16:23 by ben