====== Differences ====== This shows you the differences between two versions of the page.
|
deepunlearning [2013/06/30 16:22] bmcfee [Implementation] |
deepunlearning [2013/06/30 17:04] (current) craffel |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Idea ====== | + | ====== Deep Unlearning ====== |
| MIR techniques rely upon accurate representations of acoustic content in order to produce high-quality results. Over the past few decades, most research has operated on hand-crafted features, which work well up to a point, but may discard important information from the representation, thereby degrading performance. | MIR techniques rely upon accurate representations of acoustic content in order to produce high-quality results. Over the past few decades, most research has operated on hand-crafted features, which work well up to a point, but may discard important information from the representation, thereby degrading performance. | ||
| Line 16: | Line 16: | ||
| Our implementation is written in Python, using the LibROSA library for low-level audio analysis, and Theano for feature learning. | Our implementation is written in Python, using the LibROSA library for low-level audio analysis, and Theano for feature learning. | ||
| - | The model architecture is as follows: | + | The model architecture is based upon the ''convolutional_mlp.py'' example from the DeepLearningTutorial, with the following modifications: |
| - | - The input layer operates on a short fragment of audio (~0.5s) represented as a %$64\times 40$-dimensional Mel power spectrum. | + | - The input layer operates on a short fragment of audio (~0.5s) represented as a $64\times 40$-dimensional Mel power spectrum. |
| - Layer 1 consists of a bank of 2-dimensional convolutional filters. Each filter is convolved with the input layer, and the resulting filter responses are downsampled by spatial max-pooling. | - Layer 1 consists of a bank of 2-dimensional convolutional filters. Each filter is convolved with the input layer, and the resulting filter responses are downsampled by spatial max-pooling. | ||
| - Layer 2 consists of a linear transformation of the pooled filter responses, followed by a bank of rectified linear units | - Layer 2 consists of a linear transformation of the pooled filter responses, followed by a bank of rectified linear units | ||
| Line 33: | Line 33: | ||
| * [[https://github.com/Theano/Theano|Theano]] | * [[https://github.com/Theano/Theano|Theano]] | ||
| * [[https://github.com/bmcfee/librosa|LibROSA]] | * [[https://github.com/bmcfee/librosa|LibROSA]] | ||
| + | |||
| + | ====== Authors ====== | ||
| + | * Brian McFee | ||
| + | * Nicola Montecchio | ||
| + | |||