User Tools

Site Tools


singing_separation

====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

singing_separation [2013/06/30 16:30]
personhuang
singing_separation [2013/06/30 17:32] (current)
personhuang
Line 1: Line 1:
-====== ​Example Project Title ======+====== ​Spoken Lyrics Informed Singing Voice Separation ​======
  
-| Authors | Steve Jobs and Bill Grates | +| Authors | Zhuo Chen, Po-Sen Huang, Yi-Hsuan Yang |
-| Affiliation | Apple and Michaelsoft | +
-| Code | [[https://​github.com/​bmcfee/​librosa|Github Link]] ​|+
  
 Singing voice separation task is to separate singing voices from music accompaniment. We propose spoken lyrics informed source separation. ​ Singing voice separation task is to separate singing voices from music accompaniment. We propose spoken lyrics informed source separation. ​
Line 9: Line 7:
 ===== - Background: Singing Voice Separation ===== ===== - Background: Singing Voice Separation =====
  
-  ​* ROBUST PRINCIPAL COMPONENT ANALYSIS\\\\ minimize ​$||A||_*+ \lambda ||E||_1$, subject to $A+E = M$\\\\ Music accompaniment can be assumed to be in a low-rank subspace, because of its repetition structure.\\\\ Singing voices can be regarded as relatively sparse within songs.+**ROBUST PRINCIPAL COMPONENT ANALYSIS**\\ Minimize ​$||A||_*+ \lambda ||E||_1$, subject to $A+E = M$\\ Music accompaniment can be assumed to be in a low-rank subspace, because of its repetition structure.\\ Singing voices can be regarded as relatively sparse within songs.\\ 
  
-  ​Nonnegative matrix factorization+**NMF**\\ Factorization algorithms can help sources separation in many cases.
  
 ===== - Proposed Spoken Lyrics-informed source separation ===== ===== - Proposed Spoken Lyrics-informed source separation =====
  
-  ​* HAMR-RPCA +**HAMR-RPCA\\ 
- +HAMR-NMF**
-  * HAMR-NMF+
  
 ==== - HAMR-RPCA ==== ==== - HAMR-RPCA ====
 +**Objective**\\ ​
 +- Minimize $||A||_* + \lambda ||E||_1 + \gamma || E - E_0 ||_F ^2$ subject to $A+E = M$\\ 
  
-  ​HAMR-RPCA\\\\ minimize $||A||_\lambda ||E||_1 + \gamma || E E_0 ||_F ^2$ subject to $A+E = M$\\  +**Framework**\\  - Given mixed signals, run RPCA to obtain $E_{RPCA}$. \\  ​- ​We use dynamic time warping to warp spoken lyrics ​$E_{spoken}to $E_{RPCA}$. Define ​$E_0as the $E_{spoken}for HAMR-RPCA.
-  * Frame work\\\\ ​Given mixed signals, run RPCA to obtain $E_{RPCA}$.\\\\ We use dynamic time warping to warp spoken lyrics E_{spoken} to E_{RPCA}.\\\\ Define E_0 as the E_{spoken} for HAMR-RPCA. +
- +
-(Results -- with ground truth singing voice as E_0) +
- +
-(Results -- with ground truth singing voice from spoken lyrics) +
- +
-Conclusion +
- +
- +
-==== - Separation with codebook ==== +
-Idea:+
  
-Separate the singing ​and background with the help of the dictionary that was learnt from the speech+**Results**\\ -- from RPCA\\ -- using ground truth singing ​voice as $E_0$\\ -- using ground truth dynamic time warping ​of spoken lyrics as $E_0$\\ -- using dynamic time warping of spoken lyrics as $E_0$
  
-Intuition: 
  
-Because the lyric is assumed to be known in our cases, the hope of this model is that the dictionary learnt from the spoken lyrics can encode the information which is the same in the singing voice. ​ 
  
-Problems: +==== - HAMR-NMF ​Separation with codebook ==== 
-  ​Pitch difference +**Idea**\\ ​Separate the singing and background with the help of the dictionary that was learnt from the speech
-  Voice difference+
  
-Solutions:​ +**Intuition**\\ ​Because the lyric is assumed to be known in our cases, the hope of this model is that the dictionary ​learnt from the spoken lyrics can encode the information which is the same in the singing voice
-  ​Generate extra dictionary ​elements. +
-  - Adaptation+
  
-So, the overall formulation:​ +**Problems**\\ - Pitch difference\\  ​Voice difference
-$\min_{H,A}\lambda||H||_1+\beta||A||_*+||W-W_0||_2^2$ subject to $Y=WH+A$+
  
-Whole process steps:+**Solutions**\\ - Generate extra dictionary elements.\\ - Adaptation
  
-  ​Synthesize the lyric to speech  +**Objectives**\\ ​$\min_{H,​A}\lambda||H||_1+\beta||A||_*+||W-W_0||_2^2$ subject ​to $Y=WH+A$
-  - Extracted the dictionary from the synthesized speech +
-  - Generate extra dictionary elements +
-  - Running the system+
  
-Result:+**Whole process steps**\\ - Synthesize the lyric to speech\\ ​ - Extracted the dictionary from the synthesized speech\\ - Generate extra dictionary elements\\ - Running the system
  
-Conclusion:​ +**Result**
-  - It converges +
-  - Not obviously better (or worse) than original RPCA+
  
-Future work: +**Conclusion**\\ ​ ​- ​It converges\\ - Not obviously better ​(or worsethan original RPCA
-  ​- ​Better adaptation schemes ​(adaptation, transformation etc)+
  
 +**Future work**\\ - Better adaptation schemes (adaptation,​ transformation etc)
singing_separation.1372624256.txt.gz · Last modified: 2013/06/30 16:30 by personhuang