HAMR@ISMIR 2014 Proceedings

====== Differences ====== This shows you the differences between two versions of the page.

--- realtime_solo-to-tutti_audio_alignment_separation-by-humming_for_realtime_karaoke_generation [2014/10/26 04:13]
maezawa created
+++ realtime_solo-to-tutti_audio_alignment_separation-by-humming_for_realtime_karaoke_generation [2014/10/26 05:21] (current)
maezawa
@@ Line 1: / Line 1: @@
+====== Karaoke on Your Favorite Recording... without a SMF! ======
+====== Background ======
+Automatic accompaniment (e.g. a karaoke that follows your playing) kicks ass.
+It further kicks ass when the karaoke track is generated from that favorite recording of yours,
+with the soloist separated out.
+One thing that bugs me, though, is that existing
+methods require digital score data (e.g. standard MIDI file, musicXML, etc.).
+Preparing a SMF is annoying, so I want an accompaniment system that does not need SMFs to
+work.
+SMF is used for two purposes: (1) Making a karaoke track from your favorite recording (informed source separation), (2a) Tracking where you are playing in the music (score following), and (2b) Understanding which part in the karaoke
+track the system should be playing back (offline alignment).
+So, my goal is to circumvent the use of SMF for (1) generating the karoake track, and (2) synchronizing your  playing to the karaoke track.
 ====== Problem statement ======
 I basically want to (1) load a favorite violin concerto, (2) play the violin concerto on my violin, then (3) the track from (1) plays in sync with me, with the violin solo part separated out.
@@ Line 32: / Line 49: @@
  The HMM is left-to-right, allowing a current state to (1) stay in the same state, (2) advance to next state.
  The key here is that the overlap used for computing **X** is smaller than that used for **U**.
- This way, the left-to-right architecture permits the user to play faster than **X**.
+ For example, **X** is computed at 10 frames per second, whereas **U** is computed at 50 frames per second.
+ This way, the left-to-right architecture permits the user to play faster than **X**, and the number of states becomes manageable enough to accommodate for a moderately long piece of music.
  Aside 1: Elaborate schemes using semi-HMM wasn't worth the effort, at least for simple duration pdf.
- Aside 2: I tried first modeling the state dynamics using Particle filter (like Montecchio2011, Ohtsuka2011)
+ Aside 2: I tried first modeling the state dynamics using Particle filter but it didn't quite work.  With finite particles, once it gets "stuck," simple proposal distribution
- but it didn't quite work.  With finite particles, once it gets "stuck," simple proposal distribution
  is insufficient to recover the right position.
@@ Line 65: / Line 83: @@
 In implementation, I prepared a few "detuned" version of **X**(s) as well, as to compensate for small tuning variations.

HAMR@ISMIR 2014 Proceedings

User Tools

Site Tools

Page Tools