LabROSA : Projects :

The Listening Machine Project

"The Listening Machine - Sound Source Organization for Multimedia Understanding" is an NSF-funded project at LabROSA concerned with separating and recognizing acoustic sources in complex, real-world mixtures. This web page is a central hub for materials resulting from this project.

Project reports

Project Summary and Description from the original NSF proposal
2003-4 annual report
2004-5 annual report
2005-6 annual report
2006-7 annual report
2007-8 annual report
Final report

Theses

M. J. Reyes-Gomez (2005)
Statistical Graphical Models for Scene Analysis, Source Separation and Other Audio Applications
Ph.D. Thesis, Columbia University Dept. of Electrical Engineering.
G. E. Poliner (2008)
Classification-Based Music Transcription
Ph.D. Thesis, Columbia University Dept. of Electrical Engineering.
X. Halkias (2008)
Detection and Tracking of Dolphin Vocalizations
Ph.D. Thesis, Columbia University Dept. of Electrical Engineering.
Keansub Lee (2009)
Analysis of Environmental Sounds
Ph.D. Thesis, Columbia University Dept. of Electrical Engineering.

Publications

2005

K. Dobson, B. Whitman, D. Ellis (2005) Learning Auditory Models of Machine Voices Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio WASPAA-05, Mohonk NY, October 2005, pp. 339-342. (4pp)
G. Poliner, D. Ellis (2005) A Classification Approach to Melody Transcription Proc. Int. Conf. on Music Info. Retrieval ISMIR-05, London, September 2005, pp.161-166. (6pp)
M. Mandel, D. Ellis (2005) Song-Level Features and Support Vector Machines for Music Classification Proc. Int. Conf. on Music Info. Retrieval ISMIR-05, London, September 2005, pp.594-599. (6pp)
N. Lesser, D. Ellis (2005). Clap Detection and Discrimination for Rhythm Therapy Proc. ICASSP-05, Philadelphia, March 2005. (4pp)
M. Reyes-Gomez, N. Jojic, and D. Ellis (2005). Deformable Spectrograms AI & Statistics 2005, Barbados, Jan 2005. (8pp)

2004

D. Ellis and K.S. Lee (2004). Minimal-Impact Audio-Based Personal Archives First ACM workshop on Continuous Archiving and Recording of Personal Experiences CARPE-04, New York, Oct 2004. (6pp)
M. Reyes-Gomez, N. Jojic, and D. Ellis (2004). Towards single-channel unsupervised source separation of speech mixtures: The layered harmonics/formants separation-tracking model Workshop on Statistical and Perceptual Audio Processing SAPA'04, Jeju, Korea, Oct 2004. (6pp)
M.J. Reyes-Gomez, N. Jojic and D. Ellis (2004). Detailed graphical models for source separation and missing data interpolation in audio. Snowbird Learning Workshop, Snowbird, 2004. (2pp)
D. Ellis and K.S. Lee (2004). Features for Segmenting and Classifying Long-Duration Recordings of Personal Audio Workshop on Statistical and Perceptual Audio Processing SAPA'04, Jeju, Korea, Oct 2004. (6pp)
M.J. Reyes-Gomez, D. Ellis and N. Jojic (2004). Multiband Audio Modeling for Single Channel Acoustic Source Separation. To appear in Proc. ICASSP-04, Montreal, May 2004. (4pp)
D. Ellis and J. Arroyo (2004). Eigenrhythms: Drum pattern basis sets for classification and generation International Symposium on Music Information Retrieval ISMIR-04, Barcelona, Oct 2004, pp. 101-106. (6pp) (longer tech report version with color figures)

2003

M.J. Reyes-Gomez, B. Raj and D. Ellis (2003). Multi-channel Source Separation by Beamforming Trained with Factorial HMMs. Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio, Mohonk NY, October 2003. (4pp)
M.J. Reyes-Gomez & D.P.W. Ellis (2003). Selection, Parameter Estimation, and Discriminative Training of Hidden Markov Models for General Audio Modeling. Proc. ICME-03, Baltimore, July 2003. (4pp)
M.J. Reyes-Gomez, B. Raj, and D. Ellis (2003). Multi-channel Source Separation by Factorial HMMs. Proc. ICASSP-03, Hong Kong, April 2003. (4pp)

Web materials

Digital Signal Processing: Class materials (slides, assignments, practicals, demonstrations).
Speech and Audio Processing and Recognition:: Class materials (slides, assignments, practicals, demonstrations).
Short course in : Music Content Analysis by Machine Learning: Class notes and self-guided practical.
Seminar in Machine Learning for Signal Processing: Reading list and session summaries.
Focused collection of Sound Examples for use in student projects.
Matlab examples of common audio processing algorithms continually updated.
Ground-truth data for real-world pop music examples including Hand-marked major phrase breaks and MIDI replicas.
Database definition and subjective similarity ground-truth datasets for 400 popular music artists.

Acknowledgment

This material is based in part upon work supported by the National Science Foundation under Grant No. IIS-0238301. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Last updated: $Date: 2009/06/05 16:10:53 $
Dan Ellis <dpwe@ee.columbia.edu>