Dan Ellis : Resources: Matlab:

CHIMEFIND - Tool to realign labels to "chimes" in LDC RATS data

chimefind is a Matlab script that uses a custom-designed filter to identify the "start-of-utterance" chimes within the "clean" rebroadcast files created by LDC for the RATS program. It then writes a label file corresponding to the audio file, either based on the found chimes alone, or by taking an existing label file and adjusting the start times to correspond to each chime (while preserving the durations of each segment, and any label contents). chimefind can read both Audacity-format 3-column label files start end label, and the 11-column tab-separated omnibus label file distributed as LDC2011E32.

Contents

Example usage

In the code below, we scan a waveform file and write a new label file based on each of the detected chimes. Since no input labels are provided, the output labels are empty, and the segment end times are simply 5 s before the start of the next segment (or end of file). The -disp flag generates an interpretive plot.

chimefind -clean 20110328_070019_0000_lre07ara.clean.flac ...
    -labelsout 20110328_070019_0000_lre07ara.txt -disp 1
********** chimefind v0.7 of 20110804 **********
No input labels provided; outputting pure times
10 adjusted labels written to 20110328_070019_0000_lre07ara.txt (audacity format)

Usage with label file input

As described abouve, segment durations and labels can be read from an external file, either 3 column, or the LDC 11-column format. For the LDC format, we scan for lines tagged with the "root" of the wavfile name, and insert the transcript field as the label:

chimefind -clean 20110328_070019_0000_lre07ara.clean.flac ...
    -labelsin dev-0-part1-annotation.tsv ...
    -labelsout 20110328_070019_0000_lre07ara.txt ...
    -ldclabels 1 -disp 1 -strict 0
********** chimefind v0.7 of 20110804 **********
Reading LDC-format label file dev-0-part1-annotation.tsv ...
... found 96 labels for 20110328_070019_0000_lre07ara
Matched 10 of 10 chimes (within 5 secs)
  median chime delay = -0.12935 s, min = -0.1426 s, max = 0.9948 s
*** Matched only 10 chimes in 20110328_070019_0000_lre07ara.clean.flac to 96 label segments in dev-0-part1-annotation.tsv
86 unused labels:
  #11: 316.000 (5:16) 331.180 
  #12: 337.048 (5:37) 339.168 
  #13: 345.016 (5:45) 350.106 
  #14: 355.954 (5:56) 370.084 
  #15: 375.953 (6:16) 398.024 
  #16: 403.921 (6:44) 415.921 
  #17: 421.779 (7:02) 424.799 
  #18: 430.654 (7:11) 444.584 
  #19: 450.418 (7:30) 461.708 
  #20: 467.558 (7:48) 538.068 
  #21: 543.928 (9:04) 570.688 
  #22: 576.567 (9:37) 612.077 
  #23: 617.924 (10:18) 620.424 
  #24: 626.283 (10:26) 630.223 
  #25: 636.079 (10:36) 652.469 
  #26: 658.343 (10:58) 660.674 
  #27: 666.515 (11:07) 673.355 
  #28: 679.232 (11:19) 683.332 
  #29: 689.185 (11:29) 704.905 
  #30: 710.745 (11:51) 713.535 
  #31: 719.416 (11:59) 733.016 
  #32: 738.883 (12:19) 763.413 
  #33: 769.256 (12:49) 777.356 
  #34: 783.207 (13:03) 841.828 
  #35: 847.655 (14:08) 868.235 
  #36: 874.106 (14:34) 887.046 
  #37: 892.870 (14:53) 905.300 
  #38: 911.149 (15:11) 913.799 
  #39: 919.664 (15:20) 973.294 
  #40: 979.112 (16:19) 1023.962 
  #41: 1029.841 (17:10) 1041.581 
  #42: 1047.449 (17:27) 1051.069 
  #43: 1056.932 (17:37) 1099.822 
  #44: 1105.678 (18:26) 1156.268 
  #45: 1162.126 (19:22) 1166.086 
  #46: 1171.953 (19:32) 1183.783 
  #47: 1189.623 (19:50) 1193.243 
  #48: 1199.106 (19:59) 1210.046 
  #49: 1215.870 (20:16) 1223.380 
  #50: 1229.244 (20:29) 1238.774 
  #51: 1244.664 (20:45) 1249.054 
  #52: 1254.898 (20:55) 1320.848 
  #53: 1326.687 (22:07) 1343.477 
  #54: 1349.310 (22:29) 1362.470 
  #55: 1368.324 (22:48) 1383.224 
  #56: 1389.087 (23:09) 1393.097 
  #57: 1398.946 (23:19) 1414.016 
  #58: 1419.881 (23:40) 1471.911 
  #59: 1477.750 (24:38) 1535.020 
  #60: 1540.900 (25:41) 1557.250 
  #61: 1563.101 (26:03) 1629.691 
  #62: 1635.515 (27:16) 1640.445 
  #63: 1646.342 (27:26) 1675.362 
  #64: 1681.198 (28:01) 1751.188 
  #65: 1757.018 (29:17) 1777.198 
  #66: 1783.062 (29:43) 1857.552 
  #67: 1863.414 (31:03) 1881.164 
  #68: 1887.021 (31:27) 1913.501 
  #69: 1919.392 (31:59) 2002.712 
  #70: 2008.555 (33:29) 2024.155 
  #71: 2030.053 (33:50) 2033.073 
  #72: 2038.927 (33:59) 2111.907 
  #73: 2117.762 (35:18) 2154.683 
  #74: 2160.524 (36:01) 2234.294 
  #75: 2240.156 (37:20) 2252.296 
  #76: 2258.123 (37:38) 2269.663 
  #77: 2275.496 (37:55) 2279.986 
  #78: 2285.823 (38:06) 2320.923 
  #79: 2326.835 (38:47) 2368.635 
  #80: 2374.471 (39:34) 2409.351 
  #81: 2415.201 (40:15) 2481.051 
  #82: 2486.943 (41:27) 2496.804 
  #83: 2502.645 (41:43) 2546.175 
  #84: 2552.015 (42:32) 2578.645 
  #85: 2584.527 (43:05) 2647.397 
  #86: 2653.239 (44:13) 2731.849 
  #87: 2737.683 (45:38) 2741.744 
  #88: 2747.651 (45:48) 2765.951 
  #89: 2771.820 (46:12) 2822.791 
  #90: 2828.627 (47:09) 2846.027 
  #91: 2851.875 (47:32) 2890.435 
  #92: 2896.261 (48:16) 2899.541 
  #93: 2905.463 (48:25) 2908.853 
  #94: 2914.712 (48:35) 2927.572 
  #95: 2933.429 (48:53) 2937.469 
  #96: 2943.319 (49:03) 2999.209 
10 adjusted labels written to 20110328_070019_0000_lre07ara.txt (audacity format)

Optional arguments

This is the full range of arguments accepted:

 -clean <filename>     Path to clean (ref) wavfile
 -labelsin <filename>  Input label file name
 -labelsout <filename> Where to write 3-column output
 -ldclabels 0/1        Input label file is 3 col <stt end lab> or 11 col
 -segmap 0/1           Flag to both read and write 4-col segmap files
 -chimeoffs <time>     Offset time added to chime (default 0.525 s)
 -strict 0/1           Writes nothing if wrong number of chimes found (1)
 -disp 0/1             Optional graphic display
 -dechimeout <filename>  Replace detected chimes with dither & write out

Compiled target usage

Invoking the compiled target has the same syntax as above, e.g.

./run_chimefind_prj.sh -clean 20110328_070019_0000_lre07ara.clean.flac -labelsout 20110328_070019_0000_lre07ara.txt

Installation

This package has been compiled for several targets using the Matlab compiler. You will also need to download and install the Matlab Compiler Runtime (MCR) Installer. Please see the table below:

ArchitectureCompiled packageMCR Installer
32 bit Linux chimefind_GLNX86.zip Linux MCR Installer
64 bit Linux chimefind_GLNXA64.zip Linux 64 bit MCR Installer
64 bit MacOS chimefind_MACI64.zip MACI64 MCR Installer

The original Matlab code used to build this compiled target is available at

 <http://labrosa.ee.columbia.edu/projects/chimefind/>

All sources are in the package chimefind.zip.

Feel free to contact me with any problems.

Changelog

2011-05-19 v0.1 Initial release

2011-05-31 v0.2 Added support for "segmap" file input and output

2011-06-06 v0.3 Fixed false alarms for speech that happened to hit one of the frequencies by taking the MIN of two different filters, each selecting one of the tone pairs.

2011-06-06 v0.4 Even more caution in chimefindinwav to avoid spurious matches from noise, and also to impose a minimum energy threshold on matches (to avoid finding chimes in chimeless files). Also added -strict flag to suppress writing anything if label file is inconsistent.

2011-06-07 v0.5 STRICT mode now simply assumes that labels correspond to chimes in the order that they are presented, and rewrites them regardless (as long as they are within matchthresh) instead of trying to find the closest one. Useful for corrupted segmap files.

2011-06-10 v0.6 Slightly looser criteria will detect shorter chimes; unmatched labels are now reported; added -dechimeout option.

2011-08-04 v0.7 Added version number to output reports.

Acknowledgment

This work was supported by DARPA under the RATS program via a subcontract from the SRI-led team SCENIC. My work was on behalf of ICSI.

$Header: /Users/dpwe/docs/grants/2010-01-DARPA-RATS/code/chimefind/RCS/demo_chimefind.m,v 1.2 2011/05/03 20:32:37 dpwe Exp dpwe $