LabROSA - Publications

Publications

Speech and Source Separation

Music Signal Analysis

Environmental/Marine

2007

M. Mandel and D. Ellis (2007) EM localization and separation using interaural level and phase cues: Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio WASPAA-07, pp. 275-278, Mohonk NY, October 2007.
R. Weiss and D. Ellis (2007) Monaural speech separation using source-adapted models: Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio WASPAA-07, pp. 114-117, Mohonk NY, October 2007.
M. Athineos and D. Ellis (2007) Autoregressive Modeling of Temporal Envelopes: IEEE Tr. Signal Processing, vol. 15 no. 11, pp. 5237-5245, Nov 2007.
P. Scanlon, D. Ellis, R. Reilly (2007) Using Broad Phonetic Group Experts for Improved Speech Recognition: IEEE Tr. Audio, Speech, Lang. Proc., vol. 15 no. 3, pp. 803-812, March 2007.

C. Smit and D. Ellis (2007) Solo voice detection via optimal cancelation: Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio WASPAA-07, pp. 207-210, Mohonk NY, October 2007.
G. Poliner and D. Ellis (2007) Improving generalization for polyphonic piano transcription: Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio WASPAA-07, pp. 86-89, Mohonk NY, October 2007.
D. Ellis (2007) Classifying Music Audio with Timbral and Chroma Features: Proc. ISMIR-07, pp. 339-340, Vienna, Austria, October 2007.
(See also the poster I presented at ISMIR-07.)
M. Mandel and D. Ellis (2007) A Web-Based Game for Collecting Music Metadata: Proc. Int. Conf. on Music Info. Retrieval ISMIR-07, pp. 365-366, Vienna, Austria, October 2007.
(See also the 6 page tech. report.)
J. H. Jensen, D. Ellis, M. G. Christensen, S. H. Jensen (2007) Evaluation Distance Measures Between Gaussian Mixture Models of MFCCs: Proc. Int. Conf. on Music Info. Retrieval ISMIR-07, pp. 107-108, Vienna, Austria, October 2007.
D. Ellis and C. Cotton (2007) The 2007 LabROSA Cover Song Detection System: MIREX 2007 Audio Cover Song Evaluation system description, Sep 2007. (4pp)
(See also the poster I presented at ISMIR-07.)
D. Ellis (2007) Beat Tracking by Dynamic Programming: J. New Music Research, Special Issue on Beat and Tempo Extraction, vol. 36 no. 1, March 2007, pp. 51-60. (10pp)
DOI: 10.1080/09298210701653344
D. Ellis and G. Poliner (2007) Identifying Cover Songs With Chroma Features and Dynamic Programming Beat Tracking: Proc. ICASSP-07 Hawai'i, pp. IV-1429-1432.
G. Poliner, D. Ellis, A. Ehmann, E. Gómez, S. Streich, B. Ong (2007) Melody Transcription from Music Audio: Approaches and Evaluation: IEEE Tr. Audio, Speech, Lang. Proc., vol. 14 no. 4, May 2007, pp. 1247-1256.
G. Poliner and D. Ellis (2007) A Discriminative Model for Polyphonic Piano Transcription: Eurasip Journal of Advances in Signal Processing, special issue on Music Signal Processing, 2007 (2007), Article ID 48317. (9pp)
DOI: 10.1155/2007/48317

S.-F. Chang, D. Ellis, W. Jiang, K. Lee, A. Yanagawa, A. Loui, J. Luo (2007) Large-scale multimodal semantic concept detection for consumer video: Multimedia Information Retrieval workshop, ACM Multimedia Augsburg, Germany, Sep 2007, pp. 255-264.
DOI: 10.1145/1290082.1290118
J. Ogle and D. Ellis (2007) Fingerprinting to Identify Repeated Sound Events in Long-Duration Personal Audio Recordings: Proc. ICASSP-07 Hawai'i, pp.I-233-236. (4pp)
A. Doherty, A. Smeaton, K.-S. Lee, and D. Ellis (2007) Multimodal Segmentation of Lifelog Data: Proc. 8th Int. Conf. on Computer-Assisted Information Retrieval RIAO 2007, Pittsburgh, May 2007. (18pp)

2006

M. Mandel, D. Ellis, and T. Jebara (2006) An EM algorithm for localizing multiple sound sources in reverberant environments: Advances Neural Info. Proc. Sys. 19, Vancouver CA, Dec 2006, pp. 953-960. (8pp)
D. Ellis (2006) Model-Based Scene Analysis: Chapter 4 of Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, D. Wang & G. Brown, eds., Wiley/IEEE Press, pp. 115-146, 2006. (46pp)
M. Mandel and D. Ellis (2006) A probability model for interaural phase difference: Proc. Workshop on Statistical and Perceptual Audition SAPA-06, pp. 1-6, Pittsburgh PA, Oct 2006. (6pp)
R. Weiss and D. Ellis (2006) Estimating single-channel source separation masks: Relevance Vector Machine classifiers vs. pitch-based masking: Proc. Workshop on Statistical and Perceptual Audition SAPA-06, pp. 31-36, Pittsburgh PA, Oct 2006. (6pp)
D. Ellis and R. Weiss (2006) Model-Based Monaural Source Separation Using a Vector-Quantized Phase-Vocoder Representation: Proc. ICASSP-06, Toulouse, May 2006, pp. V-957-960. (4pp)
D. Ellis (2006) Modeling the auditory component of speech: Chapter 24 of Listening to speech: An auditory perspective, S. Greenberg & W. Ainsworth, eds., Lawrence Erlbaum, pp.393-307, 2006. (13pp)
D. Ellis, B. Raj, J. Brown, M. Slaney, P. Smaragdis (2006) Editorial - Special Section on Statistical and Perceptual Audio Processing: IEEE Tr. Audio, Speech and Lang. Proc., vol 14 no 1, pp. 2-4, Jan. 2006. (3pp)

D. Ellis (2006) Extracting Information from Music Audio: Communications of the ACM invited paper, special issue on Music Information Retrieval, vol. 49, no. 8, pp.32-37, August 2006. (6pp)
D. Ellis and G. Poliner (2006) Classification-Based Melody Transcription: Machine Learning, special issue on Machine Learning In and For Music, vol. 65, no. 2-3, pp. 439-456, Dec 2006. (18pp)
DOI: 10.1007/s10994-006-8373-9
M. Mandel, G. Poliner, D. Ellis (2006) Support Vector Machine Active Learning for Music Retrieval: Multimedia Systems, special issue on Machine Learning Approaches to Multimedia Information Retrieval, vol. 12, no. 1, pp. 3-13, Aug 2006. (10pp)
DOI: 100.1007/s00530-006-0032-2
D. Ellis (2006) Identifying `Cover Songs' with Beat-Synchronous Chroma Features: MIREX 2006 Audio Cover Song Contest system description, Sep 2006. (4pp)
D. Ellis (2006) Beat Tracking with Dynamic Programming: MIREX 2006 Audio Beat Tracking Contest system description, Sep 2006. (3pp)

K. Lee and D. Ellis (2006) Voice Activity Detection in Personal Audio Recordings Using Autocorrelogram Compensation: Interspeech ICSLP-06, pp. 1970-1973, Pittsburgh, Oct 2006. (4pp)
D. Ellis and K. Lee (2006) Accessing minimal-impact personal audio archives: IEEE MultiMedia, vol. 13 no. 4, Oct-Dec 2006, pp. 30-38. (9pp)
X. Halkias and D. Ellis (2006) Call detection and extraction using Bayesian inference: Applied Acoustics, special issue on Marine Mammal Detection, vol. 67, no. 11-12, Nov-Dec. 2006, pp. 1164-1174 (11pp).
X. Halkias and D. Ellis (2006) Estimating the Number of Marine Mammals using Recordings of Clicks from One Microphone: Proc. ICASSP-06, Toulouse, May 2006, pp. V-769-772. (4pp).

2005

N. Morgan, Q. Zhu, A. Stolcke, K. Sonmez, S. Sivadas, T. Shinozaki, M. Ostendorf, P. Jain, H. Hermansky, D. Ellis, G. Doddington, B. Chen, O. Cetin, H. Bourlard, and M. Athineos (2005) Pushing the Envelope -- Aside: IEEE Signal Processing Magazine 22(5), pp. 81-88, Sep. 2005. (8pp)
C.-P. Chen, J. Bilmes, D. Ellis (2005) Speech Feature Smoothing for Robust ASR: Proc. ICASSP-05, Philadelphia, March 2005, pp. I-525-528. (4pp)
M. Reyes-Gomez, N. Jojic, and D. Ellis (2005) Deformable Spectrograms: AI & Statistics 2005, Barbados, Jan. 2005, pp. 285-292. (8pp)
J. Barker, M. Cooke, D. Ellis (2005) Decoding speech in the presence of other sources: Speech Communication, 45(1), Jan. 2005, pp. 5-25. (26pp)

G. Poliner, D. Ellis (2005) A Classification Approach to Melody Transcription: Proc. Int. Conf. on Music Info. Retrieval ISMIR-05, London, September 2005, pp.161-166. (6pp)
M. Mandel, D. Ellis (2005) Song-Level Features and Support Vector Machines for Music Classification: Proc. Int. Conf. on Music Info. Retrieval ISMIR-05, London, September 2005, pp.594-599. (6pp)

K. Dobson, B. Whitman, D. Ellis (2005) Learning Auditory Models of Machine Voices: Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio WASPAA-05, Mohonk NY, October 2005, pp. 339-342. (4pp)
N. Lesser, D. Ellis (2005) Clap Detection and Discrimination for Rhythm Therapy: Proc. ICASSP-05, Philadelphia, March 2005, pp. III-37-40. (4pp)
(See also the talk slides which describe an energy ratio feature that does much better than the ones described in the paper.)

2004

M. Athineos, H. Hermansky and D. Ellis (2004) LP-TRAP: Linear predictive temporal patterns: International Conference on Spoken Language Processing ICSLP-04, Jeju, Korea, Oct 2004, pp. 949-952. (4pp)
M. Athineos, H. Hermansky and D. Ellis (2004) PLP^2: Autoregressive modeling of auditory-like 2-D spectro-temporal patterns: ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing SAPA-04, Jeju, Korea, Oct 2004, pp. 37-42. (5pp)
M. Reyes-Gomez, N. Jojic, and D. Ellis (2004) Towards single-channel unsupervised source separation of speech mixtures: The layered harmonics/formants separation-tracking model: ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing SAPA-04, Jeju, Korea, Oct 2004, pp. 25-30. (6pp)
D. Ellis and J. Liu (2004) Speaker turn segmentation based on between-channel differences: NIST Meeting Recognition Workshop @ ICASSP, pp. 112-117, Montreal, May 2004. (6pp)
L. Kennedy and D. Ellis (2004) Laughter Detection in Meetings: NIST Meeting Recognition Workshop @ ICASSP, pp. 118-121, Montreal, May 2004. (4pp)
M.J. Reyes-Gomez, D. Ellis, N. Jojic (2004) Multiband Audio Modeling for Single Channel Acoustic Source Separation: Proc. ICASSP-04, pp. V-641-644, Montreal, May 2004. (4pp)
M.J. Reyes-Gomez, N. Jojic, D. Ellis (2004) Detailed graphical models for source separation and missing data interpolation in audio: Snowbird Learning Workshop, Snowbird, 2004. (2pp)
D. Ellis (2004) Evaluating Speech Separation Systems: Chapter 20 in Speech Separation by Humans and Machines, ed. P. Divenyi, Kluwer, pp. 295-304. (12 pp)
M. Cooke and D. Ellis (2004) Introduction to the special issue on the recognition and organization of real-world sound: Speech Communication, 43(4), Sep. 2004, pp. 273-274. (2pp)
doi: 10.1016/j.specom.2004.05.001.

D. Ellis and J. Arroyo (2004) Eigenrhythms: Drum pattern basis sets for classification and generation: International Symposium on Music Information Retrieval ISMIR-04, Barcelona, Oct 2004, pp. 554-559. (6pp)
(longer tech report version with color figures)
B. Whitman and D. Ellis (2004) Automatic Record Reviews: International Symposium on Music Information Retrieval ISMIR-04, Barcelona, Oct 2004, pp. 470-477. (8pp)
A. Berenzweig, B. Logan, D. Ellis, B. Whitman (2004) A large-scale evaluation of acoustic and subjective music-similarity measures: Computer Music Journal, 28(2), pp. 63-76, June 2004. (14pp)

D. Ellis and K.S. Lee (2004) Minimal-Impact Audio-Based Personal Archives: First ACM workshop on Continuous Archiving and Recording of Personal Experiences CARPE-04, New York, Oct 2004, pp. 39-47. (9pp)
D. Ellis and K.S. Lee (2004) Features for Segmenting and Classifying Long-Duration Recordings of Personal Audio: ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing SAPA-04, Jeju, Korea, Oct 2004, pp. 1-6. (6pp)

2003

L. Kennedy and D. Ellis (2003) Pitch-based emphasis detection for characterization of meeting recordings: Automatic Speech Recognition and Understanding Workshop IEEE ASRU 2003, pp. 243-248, St. Thomas, December 2003. (6pp)
M. Athineos and D. Ellis (2003) Frequency-domain linear prediction for temporal features: Automatic Speech Recognition and Understanding Workshop IEEE ASRU 2003, pp. 261-266, St. Thomas, December 2003. (6pp)
M.J. Reyes-Gomez, B. Raj, D. Ellis (2003) Multi-channel Source Separation by Beamforming Trained with Factorial HMMs: Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio, pp. 13-16, Mohonk NY, October 2003. (4pp)
P. Scanlon, D. Ellis, R. Reilly (2003) Using Mutual Information to design class-specific phone recognizers: Proc. Eurospeech-03, Geneva, September 2003, pp. 857-860. (4pp)
S. Renals and D. Ellis (2003) Audio Information Access from Meeting Rooms: Proc. ICASSP-03, Hong Kong, April 2003, pp. IV-744--747. (4pp)
M.J. Reyes-Gomez, B. Raj, D. Ellis (2003) Multi-channel Source Separation by Factorial HMMs: Proc. ICASSP-03, Hong Kong, April 2003, pp. I-664--667. (4pp)
A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, C. Wooters (2003) The ICSI Meeting Corpus: Proc. ICASSP-03, Hong Kong, April 2003. pp. I-364--367. (4pp)

A. Sheh and D. Ellis (2003) Chord Segmentation and Recognition using EM-Trained Hidden Markov Models: 4th International Symposium on Music Information Retrieval ISMIR-03, pp. 185-191, Baltimore, October 2003. (7pp)
R. Turetsky and D. Ellis (2003) Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Syntheses: 4th International Symposium on Music Information Retrieval ISMIR-03, pp. 135-141, Baltimore, October 2003. (7pp)
A. Berenzweig, B. Logan, D. Ellis, B. Whitman (2003) A large-scale evaluation of acoustic and subjective music similarity measures: 4th International Symposium on Music Information Retrieval ISMIR-03, pp. 103-109, Baltimore, October 2003. (7pp)
B. Logan, D. Ellis, A. Berenzweig (2003) Toward evaluation techniques for music similarity: Keynote address, Workshop on the Evaluation of Music Information Retrieval (MIR) Systems at SIGIR 2003, Toronto, August 2003. (5pp)
A. Berenzweig, D. Ellis & S. Lawrence (2003) Anchor Space for Classification and Similarity Measurement of Music: Proc. ICME-03, Baltimore, July 2003, pp. I-29--32. (4pp)

M.J. Reyes-Gomez and D. Ellis (2003) Selection, Parameter Estimation, and Discriminative Training of Hidden Markov Models for General Audio Modeling: Proc. ICME-03, Baltimore, July 2003, pp. I-73--76. (4pp)
M. Athineos and D. Ellis (2003) Sound Texture Modelling with Linear Prediction in both Time and Frequency Domains: Proc. ICASSP-03, Hong Kong, April 2003, pp. V-648--651. (4pp)

2002

A.J. Robinson, G.D. Cook, D. Ellis, E. Fosler-Lussier, S.J. Renals, D.A.G. Williams (2002) Connectionist speech recognition of Broadcast News: Speech Communication, vol. 37 no. 1-2, May 2002, pp. 27-45. (19pp)
M.J. Reyes-Gomez and D. Ellis (2002) Error visualization for tandem acoustic modeling on the Aurora task: ICASSP-02 (student session), Orlando, May 2002. (4pp)

D. Ellis, B. Whitman, A. Berenzweig, S. Lawrence (2002) The Quest for Ground Truth in Musical Artist Similarity: Proc. ISMIR-02, pp. 170-177, Paris, October 2002. (8pp)
A. Berenzweig, D. Ellis, S. Lawrence (2002) Using Voice Segments to Improve Artist Classification of Music: Proc. AES-22 Intl. Conf. on Virt., Synth., and Ent. Audio. Espoo, Finland, June 2002. (8pp)

2001

T. Pfau, D. Ellis, A. Stolcke (2001) Multispeaker Speech Activity Detection for the ICSI Meeting Recorder: Proc. ASRU-01, Italy, December 2001. (4pp)
J. Barker, M. Cooke, D. Ellis (2001) Integrating bottom-up and top-down constraints to achieve robust ASR: The multisource decoder: Presented at the CRAC workshop, pp. 63-66, Aalborg, Denmark, September 2001. (4pp)
D. Ellis and M.J. Reyes Gomez (2001) Investigations into Tandem Acoustic Modeling for the Aurora Task: Proc. Eurospeech-01, Special Event on Noise Robust Recognition, pp. 189-192, Denmark, September 2001. (4pp)
(See also the poster I presented at the conference.)
M. Cooke and D. Ellis (2001) The auditory organization of speech and other sources in listeners and computational models: Speech Communication, vol. 35 no. 3-4, Oct. 2001, pp. 141-177. (37pp)
D. Ellis, R. Singh, S. Sivadas (2001) Tandem acoustic modeling in large-vocabulary recognition: Proc. ICASSP-2001, pp. I-517-520, Salt Lake City, May 2001. (4pp)
(See also the poster I presented at the conference.)
N. Morgan, D. Baron, J. Edwards, D. Ellis, D. Gelbart, A. Janin, T. Pfau, E. Shriberg, A. Stolcke (2001) The Meeting Project at ICSI: Human Language Technologies Conference, San Diego, March 2001, pp. 246-252. (7pp)

A.L. Berenzweig and D. Ellis (2001) Locating Singing Voice Segments within Music Signals: Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio, pp. 119-122, Mohonk NY, October 2001. (4pp)

D. Ellis (2001) Detecting Alarm Sounds: Presented at the CRAC workshop, pp. 59-62, Aalborg, Denmark, September 2001. (4pp)
(See also the poster I presented at the workshop.)

Publications

Speech and Source Separation

Music Signal Analysis

Environmental/Marine

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001