HAMR 2013 Proceedings

A PCRE internal error occured. This might be caused by a faulty plugin

====== Differences ====== This shows you the differences between two versions of the page.

--- remixavier [2013/06/30 16:45]
craffel
+++ remixavier [2013/07/02 00:57] (current)
dpwe
@@ Line 4: / Line 4: @@
 | Affiliation | Columbia University |
 | Code | [[https://github.com/craffel/remixavier|Github Link]] |
+| Matlab | [[http://labrosa.ee.columbia.edu/~dpwe/resources/matlab/remixavier/|Output of 'Publish']] |
 We propose a technique for removing certain sources from an audio mixture when a separate recording of those sources is available.
@@ Line 19: / Line 20: @@
 Denote the time-domain digitized signal of a piece of music as $m$.  We assume we can obtain a signal $s$ which represents a recording of the same piece of music but only including a subset of the instruments included in the original recording.  We then seek the signal $r$ which represents a recording of the instruments in $m$ which are not included in $s$.
-If $m$ and $s$ are perfectly aligned in time and have no channel distortion, we can retrieve $r$ by computing $r = m - s$.  However, this is rarely the case.  Our algorithm therefore carries out the following steps:
+If $m$ and $s$ are perfectly aligned in time and have no channel differences, we can retrieve $r$ by computing $r = m - s$.  However, this is rarely the case.  Our algorithm therefore carries out the following steps:
-  * Alignment of $m$ and $s$
+  * Identifying temporal alignment of $m$ and $s$, and resampling $s$ to match $m$'s timebase.
-  * Estimation of channel distortion present in $s$
+  * Estimation of channel differences present in $s$, and creating an equalized version.
-  * Generating and enhancing an estimate of $r$
+  * Generating and enhancing an estimate of $r$.
 We outline each of these steps in the following sections.
@@ Line 29: / Line 30: @@
 ==== - Alignment ====
-To align the signals $m$ and $s$, we compute a cross-correlation of short segments of each signal.  For each segment, we find the peak correlation value and perform a linear fit to the peak locations across the song.  This line represents the relative offset of the two recordings and their "skew", or the extent to which they have been recorded on different timescales.  Once we have estimated the offset and skew, we remove samples and resample so that the signals are aligned in time.
+We use cross-correlation of unequalized signal to find temporal alignment. First we calcuiate a cross-correlation between the entire durations of $m$ and $s$, possibly downsampled (e.g. to as low as 1 kHz) to reduce the total computation. The global peak of this correlation is taken as the average time alignment, but small differences in the sampling rates (clock drift) will lead to changes in the effective time difference throughout the track.  Although many digitally-mastered signals may have perfect time alignment, in our experience it is not unusual to see clock rate differences of 0.5\% or more where analog processing (such as magnetic tape playback) is involved. For a 200 sec track, 0.5\% time skew will cause relative timing to drift by a full second over the duration of the track.
-==== - Channel Distortion ====
+To detect such drift, we compute a cross-correlation of short segments of each signal, typically 8 sec segments every 4 sec, with correlation performed out to $\pm 2$ sec.  For each segment, we find the peak correlation value and perform a linear fit to the relative timing implied by these peaks' locations across the song.  This line represents the relative offset of the two recordings and their "drift", or the extent to which they have been recorded on different timescales.  Once we have estimated the offset and drift, we remove samples and resample so that the signals are aligned in time.  We find that repeating this operation can further correct residual timing errors, bringing the timebases to within 10 parts per million (or 2 ms drift over a 200 sec track).
-To estimate the channel distortion, we assume that there is some filter $r$ such that
+==== - Channel Differences ====
+To estimate the channel differences -- i.e., a difference in the stationary linear filtering between the two tracks -- we assume that there is some filter $r$ such that
 $$m = h\ast s + r$$
-Note that this does not always hold true in practice (for example, when different nonlinearities have been applied to $m$ and $s$).  However, we find this approximation works well in practice.  To estimate $h$, we first compute the magnitude of the short-time Fourier transform (DFT) of each signal, which gives
+Note that this does not always hold true in practice (for example, when nonlinearities have been applied to $m$ and/or $s$).  However, we find this approximation works well in practice.  To estimate $h$, we first compute the magnitude of the short-time Fourier transform (DFT) of each signal, which gives
 $$M = H \cdot S + R$$

HAMR 2013 Proceedings

User Tools

Site Tools

Page Tools