HAMR 2013 Proceedings

====== Structural Similarity With Variations ====== Diego Silva, Helene Papadopoulos **Introduction:** ------------- We are working in the context of the management and organization of large collections of music recordings. Recent work [1, 2, 3] have shown that, given a collection of recordings containing multiple performances of several pieces, it is possible to group performances of the same musical work by using a distance that measures the pairwise similarity between structural representations of music pieces. We are considering this specific application, but our work can be extended to other tasks such as cover song detection. For now, we are working on classic music. A strong limitation of the proposed previous approaches [1, 2, 3] is that they do not allow retrieving music recordings according to their structural similarity to an audio query when there are strong structural variations between the various performances of the same piece. More specifically, in the case of classical music, the composer may indicate that a part of the piece has to be repeated. However some musicians will decide to do the repetition while others will not. This is the main cause of possible structural variations between two performances of the same classical music piece. We are focusing on this specific problem. Note that in other contexts, such as cover song detection, there exist other types of strong structural variations between similar pieces. A recent study [4] identifies the following major possible structural variations between two covers of the same song: skipping a part (e.g. the intro), repeating a section (e.g. the chorus), introducing an instrumental section, or shortening one, having radical change in the musical section ordering. Two exemplas of the same piece: With repetition {{:barenboim1984_mozart_kv279_3.jpg?200|}} Without repetition {{:gould1967_mozart_kv279_3.jpg?200|}} **Method:** ------- This work is an extension of our previous work [3] where we consider the following retrieval scenario: given a query recording and a collection of music recordings that contains various performances of the same piece as the query, along with recordings of different compositions, we aim at retrieving all other performances of the query music piece. To this end, we first extract a set of chroma features that provides relevant information about the musical structure [5]. Using these chroma features, the query, as well as each recording in the collection, are transformed into a self-similauty matrix, SSM [6]. The query matrix is compared to each piece of the collection using the CK-1 measure [7]. The most relevant recordings in accordance with the query are those with smallest distance to the query. Here, to handle the problem of possible omissions or repetition of a segment, we propose to remove all consecutive repetitions of same segment types in the SSM. By doing this, none of the new version of the performances will contain repeats. 1) For each piece we look for consecutive repetitions of the same segment type. Repetitions of same segment types are indicated in the SSM by off-diagonal stripes parallel to the main diagonal. We compute a chromagram from the original audio signal, and then a SSM. We find consecutive repetitions of a given segment type by analyzing the SSM. - Transform SSM in a Recurrence Plot. - Use a motion filter (blur) to enhance diagonals. - Find diagonal candidates. All "black points" in Recurrence Plot are candidates. - Sweep diagonal by finding black points and measure its length. This measure is used to decide if the candidate is a good diagonal or not. - If the diagonal represents consecutive repetition, anotate its begin and end. - Clean the diagonal by using a Breadth First Search of nearest black points. - Do the lasts 5 steps (including this) for each candidate. 2) We compute a reduced chromagram (RedChrom) where chunks corresponding to consecutive repetitions have been removed. 3) We use this RedChrom as an input feature to the previous model we proposed in [3]. **Results:** -------- Before reducing {{:barenboim1984_mozart_kv279_3.jpg?200|}} {{:rp.png?200|}} After reducing {{:barenboim1984_mozart_kv279_3_new.jpg?200|}} **Problems to be solved** --------------------- - The method is not parameter-free because the feature extraction step and the "consecutive repetitions finding" step use some parameters whose values have an impact on the results, depending on the analyzed piece. - We need to extend this approach to repetitions of multiple instances of a given segment type (greater than two consecutive repetitions) e.g. multiple consecutive stanzas in a popular music song. - Other case of strong structural variations are not handled. **Bibliography** ------------ [1] J.P.Bello. Measuring structural similarity in music.IEEETrans.Sp. Aud. Proc., 19(7):2013–2025, 2011. [2] P. Grosche and M. Muller. Toward characteristic audio shingles for efficient cross-version music retrieval. In ICASSP, 2012. [3] D. Silva, H. Papadopoulos, G. Batista and D. Ellis . A video compression-based approach to measure music structural similarity, submitted to ISMIR 2013 [4] J. Serrà, E. Gómez, and P. Herrera Audio cover song identification and similarity: background, approaches, evaluation, and beyond. in Advances in Music Information Retrieval, Springer-Verlag Berlin / Heidelberg Ed., 2011. [5] M. Müller and S. Ewert. Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In ISMIR, Miami, USA, 2011. [6] J. Foote. Visualizing music and audio using self-similarity. In ACM Multimedia, pages 77–80, Orlando, Florida, November 1999. [7] B.J.L. Campana and E.J. Keogh. A compression based distance measure for texture. In ICDM, pages 850–861, 2010. **Dependencies** ------------- CHROMA TOOLBOX: http://www.mpi-inf.mpg.de/resources/MIR/chromatoolbox/ CK-1: http://www.cs.ucr.edu/~bcampana/texture.php mp3read: http://labrosa.ee.columbia.edu/matlab/mp3read.html

HAMR 2013 Proceedings

User Tools

Site Tools

Page Tools