HAMR 2013 Proceedings

**This is an old revision of the document!** ----

A PCRE internal error occured. This might be caused by a faulty plugin

====== Example Project Title ====== | Authors | Steve Jobs and Bill Grates | | Affiliation | Apple and Michaelsoft | | Code | [[https://github.com/bmcfee/librosa|Github Link]] | Singing voice separation task is to separate singing voices from music accompaniment. We propose spoken lyrics informed source separation. ===== - Background: Singing Voice Separation ===== ROBUST PRINCIPAL COMPONENT ANALYSIS\\ minimize $||A||_*+ \lambda ||E||_1$, subject to $A+E = M$\\ Music accompaniment can be assumed to be in a low-rank subspace, because of its repetition structure.\\ Singing voices can be regarded as relatively sparse within songs. ===== - Proposed Spoken Lyrics-informed source separation ===== HAMR-RPCA HAMR-NMF ==== - HAMR-RPCA ==== HAMR-RPCA\\ minimize $||A||_* + \lambda ||E||_1 + \gamma || E - E_0 ||_F ^2$ subject to $A+E = M$\\ Frame work\\ Given mixed signals, run RPCA to obtain $E_{RPCA}$.\\ We use dynamic time warping to warp spoken lyrics $E_{spoken}$ to $E_{RPCA}$.\\ Define E_0 as the $E_{spoken}$ for HAMR-RPCA. Results -- from RPCA Results -- using ground truth singing voice as $E_0$ Results -- using ground truth dynamic time warping of spoken lyrics as $E_0$ Results -- using dynamic time warping of spoken lyrics as $E_0$ Conclusion ==== - Separation with codebook ==== Idea: Separate the singing and background with the help of the dictionary that was learnt from the speech Intuition: Because the lyric is assumed to be known in our cases, the hope of this model is that the dictionary learnt from the spoken lyrics can encode the information which is the same in the singing voice. Problems: - Pitch difference - Voice difference Solutions: - Generate extra dictionary elements. - Adaptation So, the overall formulation: $\min_{H,A}\lambda||H||_1+\beta||A||_*+||W-W_0||_2^2$ subject to $Y=WH+A$ Whole process steps: - Synthesize the lyric to speech - Extracted the dictionary from the synthesized speech - Generate extra dictionary elements - Running the system Result: Conclusion: - It converges - Not obviously better (or worse) than original RPCA Future work: - Better adaptation schemes (adaptation, transformation etc)

HAMR 2013 Proceedings

User Tools

Site Tools

Page Tools