====== Differences ====== This shows you the differences between two versions of the page.
singing_separation [2013/06/30 15:33] zhuo [1.3 Title of a subsection] |
singing_separation [2013/06/30 17:32] (current) personhuang |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Example Project Title ====== | + | ====== Spoken Lyrics Informed Singing Voice Separation ====== |
- | | Authors | Steve Jobs and Bill Grates | | + | | Authors | Zhuo Chen, Po-Sen Huang, Yi-Hsuan Yang | |
- | | Affiliation | Apple and Michaelsoft | | + | |
- | | Code | [[https://github.com/bmcfee/librosa|Github Link]] | | + | |
- | A short abstract could go here. | + | Singing voice separation task is to separate singing voices from music accompaniment. We propose spoken lyrics informed source separation. |
- | ===== - Title of a section ===== | + | ===== - Background: Singing Voice Separation ===== |
- | A section | + | **ROBUST PRINCIPAL COMPONENT ANALYSIS**\\ Minimize $||A||_*+ \lambda ||E||_1$, subject to $A+E = M$\\ Music accompaniment can be assumed to be in a low-rank subspace, because of its repetition structure.\\ Singing voices can be regarded as relatively sparse within songs.\\ |
- | asd test Po-Sen | + | |
- | ==== - Title of a subsection ==== | + | |
- | Etc. etc. Some math: $\beta = \sum_{i = 1}^N \sqrt{\alpha_i^2 - G^2}$ | + | **NMF**\\ Factorization algorithms can help sources separation in many cases. |
- | sdasd | + | ===== - Proposed Spoken Lyrics-informed source separation ===== |
- | ==== - Title of a subsection ==== | + | |
- | Etc. etc. Some math: $\beta = \sum_{i = 1}^N \sqrt{\alpha_i^2 - G^2}$ | + | **HAMR-RPCA\\ |
+ | HAMR-NMF** | ||
- | ==== - Title of a subsection ==== | + | ==== - HAMR-RPCA ==== |
+ | **Objective**\\ | ||
+ | - Minimize $||A||_* + \lambda ||E||_1 + \gamma || E - E_0 ||_F ^2$ subject to $A+E = M$\\ | ||
- | Etc. etc. Some math: $\beta = \sum_{i = 1}^N \sqrt{\alpha_i^2 - G^2}$ | + | **Framework**\\ - Given mixed signals, run RPCA to obtain $E_{RPCA}$. \\ - We use dynamic time warping to warp spoken lyrics $E_{spoken}$ to $E_{RPCA}$. Define $E_0$ as the $E_{spoken}$ for HAMR-RPCA. |
- | Separation with codebook | + | **Results**\\ -- from RPCA\\ -- using ground truth singing voice as $E_0$\\ -- using ground truth dynamic time warping of spoken lyrics as $E_0$\\ -- using dynamic time warping of spoken lyrics as $E_0$ |
- | Another possible extension involves the introduction of a pre-learnt dictionary in the separation process. As we know, the supervised learning algorithms can usually lead to better result in the machine learning problems. Following the intuition mentioned above, to incorporate the information from the spoken lyrics, we proposed a dictionary based model as the extension of RPCA framework. | ||
- | The basic steps are as follows: | + | |
- | - Ordered List Item 1 | + | ==== - HAMR-NMF - Separation with codebook ==== |
- | - 2 | + | **Idea**\\ - Separate the singing and background with the help of the dictionary that was learnt from the speech |
- | - 3 | + | |
- | - 4 | + | **Intuition**\\ - Because the lyric is assumed to be known in our cases, the hope of this model is that the dictionary learnt from the spoken lyrics can encode the information which is the same in the singing voice. |
- | sad | + | |
+ | **Problems**\\ - Pitch difference\\ - Voice difference | ||
+ | |||
+ | **Solutions**\\ - Generate extra dictionary elements.\\ - Adaptation | ||
+ | |||
+ | **Objectives**\\ - $\min_{H,A}\lambda||H||_1+\beta||A||_*+||W-W_0||_2^2$ subject to $Y=WH+A$ | ||
+ | |||
+ | **Whole process steps**\\ - Synthesize the lyric to speech\\ - Extracted the dictionary from the synthesized speech\\ - Generate extra dictionary elements\\ - Running the system | ||
+ | |||
+ | **Result** | ||
+ | |||
+ | **Conclusion**\\ - It converges\\ - Not obviously better (or worse) than original RPCA | ||
+ | |||
+ | **Future work**\\ - Better adaptation schemes (adaptation, transformation etc) |