AUDFPRINT - Audio fingerprint database creation + query
audfprint is a (compiled) Matlab script that can take a list of soundfiles and create a database of landmarks, and then subsequently take one or more query audio files and match them against the previously-created database. This can be used e.g. to "de-duplicate" a collection of music. The fingerprint is robust to things like time skews, different encoding schemes, and even added noise. It can match small fragments of sound, down to 10 sec or less. It is based on my Robust Landmark-Based Audio Fingerprinting
This code is being distributed as a compiled Matlab binary, which requires the matching (freely-available) Matlab Runtime to be installed. The program has the same syntax and options whether called from the OS shell, or from the Matlab prompt.
In this usage mode, a list of soundfiles is analyzed and written to a single database file. Various soundfile formats are supported, including wav, mp3 and aac.
In the file below, reflist.txt consists of full paths to a number of soundfiles, which are then written to fpdbase.mat. (Note that in this case, the "filenames" in reflist are actually URLs, which can be loaded thanks to special functionality built in to mpg123; this won't work for other file types, and normally reflist would just contain regular file names). See the Usage section below for additional options.
audfprint -dbase fpdbase -cleardbase 1 -addlist reflist.txt
Target density = 7 hashes/sec 21-May-2013 09:19:16 Adding #1 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/01-Nine_Lives.mp3...10.0 s, 46 hashes 21-May-2013 09:19:16 Adding #2 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/02-Falling_In_Love.mp3...10.0 s, 74 hashes 21-May-2013 09:19:16 Adding #3 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/03-Hole_In_My_Soul.mp3...10.0 s, 71 hashes 21-May-2013 09:19:16 Adding #4 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/04-Taste_Of_India.mp3...10.0 s, 91 hashes 21-May-2013 09:19:16 Adding #5 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/05-Full_Circle.mp3...10.0 s, 50 hashes 21-May-2013 09:19:17 Adding #6 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/06-Something_s_Gotta_Give.mp3...10.0 s, 42 hashes 21-May-2013 09:19:17 Adding #7 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/07-Ain_t_That_A_Bitch.mp3...10.0 s, 78 hashes 21-May-2013 09:19:17 Adding #8 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/08-The_Farm.mp3...10.0 s, 59 hashes 21-May-2013 09:19:17 Adding #9 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/09-Crash.mp3...10.0 s, 71 hashes 21-May-2013 09:19:17 Adding #10 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/10-Kiss_Your_Past_Good-bye.mp3...10.0 s, 39 hashes 21-May-2013 09:19:17 Adding #11 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/11-Pink.mp3...10.0 s, 40 hashes 21-May-2013 09:19:17 Adding #12 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/12-Attitude_Adjustment.mp3...10.0 s, 82 hashes 21-May-2013 09:19:17 Adding #13 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/13-Fallen_Angels.mp3...10.0 s, 26 hashes added 13 tracks (130 secs, 769 hashes, 5.9154 hashes/sec) in 2.0 sec = 0.015 x RT Hash table saved to fpdbase (13 tracks, 769 hashes) done
The command below matches a query soundfile against an existing database and returns the paths of the top 5 hits (paths as provided in the reflist.txt above). Each line of the main output consists of 5 fields: query-file-name hit-number hit-file-name matching-count match-time. matching-count gives the actual number of common, aligned fingerprints between query and hit; as a rough rule of thumb, more than 10 indicates a good match, although for very short queries even 4 or 5 matches is likely reliable. Where more than one hit is reported for a query (i.e. -nmatch > 1), they are reported in descending order of relevance, which means descending matching-count. match-time reports the delay, in seconds, between the start of the reference item and the start of the (aligned) query.
audfprint -dbase fpdbase -match query.mp3
Hash table read from fpdbase (13 tracks, 769 hashes) query.mp3 (5.8 s) analyzed to 73 hashes query.mp3 1 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/05-Full_Circle.mp3 5 -0.032 matched 1 tracks (5.832 secs, 73 hashes, 12.5171 hashes/sec) in 0.5 sec = 0.077 x RT done
Single tracks can have their hashes removed from the database with the -remove option:
audfprint -dbase fpdbase -remove http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/05-Full_Circle.mp3 % Now the query is unknown: audfprint -dbase fpdbase -match query.mp3
Hash table read from fpdbase (13 tracks, 769 hashes) Hash table saved to fpdbase (12 tracks, 719 hashes) done Hash table read from fpdbase (12 tracks, 719 hashes) query.mp3 (5.8 s) analyzed to 73 hashes *** NO HITS FOUND *** matched 1 tracks (5.832 secs, 73 hashes, 12.5171 hashes/sec) in 0.3 sec = 0.045 x RT done
All parameters to audfprint are specified in the command line via "-optionname value" pairs. The full set of options is:
No dbase specified! audfprint v0.8 of 20130518 usage: audfprint ... -dbase <file> The reference database file -cleardbase 1 Flag to build a new database from scratch -add <file ...> Sound file(s) to add to database -addlist <file> List of audio files to add to database -adddir <dir> Watch this directory and add any files -addskip <count> Skip this many initial files in addlist -addcheckpoint <count> Save database every <count> tracks -matchonaddthresh <thr> Don't add files if match >= thr (0) -remove <name ...> Delete named track(s) from dbase -removelist <file> Delete tracks named in file from dbase -density <num> Target hashes/sec (default: 7.0) -nhashbits <num> log_2 of hash table size (20) -maxnentries <num> maximum number of entries per bin (100) -timesize <num> Maximum value of abs time index (16384) -match <file ...> Audio file(s) to match -matchlist <file> List of audio files to match against database -matchdir <dir> Watch this directory and match any files -matchmaxret <num> Max num matches to report for each query (5) -matchmincount <num> Minimum count of common hashes to report (0) -matchminprop <num> Min proportion of max hash count to report (0.1) -oversamp <num> oversampling factor for queries (0..special) -userawcounts 1 count hits without applying synchrony filter -skip <time> drop time from start of each sound -maxdur <time> truncate soundfiles at this duration (0=all) -nojenkins 1 suppress use of jenkins hash for new dbs -list 1 list the files in the database -quiet 1 suppress status messages -out <file> File to write matches out to (stdout) -outdir <dir> Write match reports to this directory
The fingerprinting works by finding local maxima in the spectrogram, then recording a "landmark" as the relationship between a pair of maxima. Each pair is encoded as the frequency of the first peak (from a 512-point FFT evaluated on a 11025 Hz sampled signal, so in units of 21.5 Hz, using 8 bits), the difference in frequency bins to the second peak (6 bits, since large jumps are not recorded), and the count of time frames between the two peaks (6 bits, in units of the 32 ms hop size). That gives a total of 20 bits, leading to a space of 2^20 = 1M distinct hashes.
The hash table works by calculating all the landmarks for a given track, as well as the time at which they occur (the absolute time of the first peak, also in 32 ms units, used to check the consistency of the relative timing of landmarks and queries). Then the absolute time and the track ID (i.e. the sequence number of this track in building the database) are packed into a single 32 bit number and stored in the hash table at the address given by the 20 bit hash. (In fact, the number of buckets in the hash table is determined by -nhashbits; when this is smaller than 20, the 20 bit hashes are "mixed down" to the smaller space, with the effect that multiple hashes in the original space will be recorded in a single bucket).
Each hash bucket has space to record up to 100 different tracks (controlled by -maxnentries); once that fills up, entries are dropped at random (which is normally OK since that track will be represented by other hashes too - missing any single hash won't prevent recognition). Since the hash table is stored in RAM, the default values need 2^20 buckets x 100 entries/bucket x 4 bytes/entry = 400 MB of RAM. You can increase the number of entries per bucket with -maxnentries, but make sure you have enough RAM to accommodate the larger table. You can also reduce the RAM footprint with a smaller -maxnentries.
Because the absolute time and track ID are packed into a single 32 bit value, we have limited resolution for them. By default, the time value is stored up to 16384 (controlled by -timesize), i.e., 14 bits; beyond this, it wraps around, which introduces some additional ambiguity in the checking procedure, but is generally OK. With this default value, we are left with 32-14=18 bits to store the track ID, so the database is limited to 2^18=256k unique tracks; reference tracks beyond this limit will never be returned, but instead will be "aliased" to earlier entries. Reducing -timesize can increase the limit on the number of reference tracks; for instance, a -timesize of 256 (8 bits) would leave 24 bits for track ID, permitting 16M unique tracks to be remembered. Note, however, that at the default density of 7 hashes/sec, and a typical track of 200 s, we expect at least 1000 hashes per track, but the hash table can only record 2^20 x 100 = 100M distinct hashes, even assuming a nicely uniform distribution across the different hash values. Thus, beyond 100k tracks, we would anticipate a significant number of "dropped hashes" due to hash table buckets filling up, with a progressive impact on sensitivity.
This package has been compiled for several targets using the Matlab compiler. You will also need to download and install the Matlab Compiler Runtime (MCR) Installer. Please see the table below:
|Architecture||Compiled package||MCR Installer|
|64 bit Linux||audfprint_GLNXA64.zip||Linux 64 bit MCR Installer|
|64 bit MacOS||audfprint_MACI64.zip||MACI64 MCR Installer|
The original Matlab code used to build this compiled target is available at http://www.ee.columbia.edu/~dpwe/resources/matlab/audfprint
All sources are in the package audfprint-v0.80.zip.
Feel free to contact me with any problems.
The included function audioread is able to read a wide range of sound file types, but relies on a number of other packages and/or support functions being installed. Most obscure of these is ReadSound, a MEX wrapper I wrote for the dpwelib sound file interface. See the audioread homepage for more details.
% v0.80 2013-05-21 - find_landmarks was made about 20% faster by % avoiding processing frames entirely below % threshold. Effort to avoid crashes when % adding empty tracks with -matchonaddthresh . % % v0.79 2013-05-18 - implemented -matchonaddthresh to suppress % adding tracks if they match something already % in the dbase. % % v0.78 2013-05-15 - fixed bug in ht_store that limited default ID % space to 131,072 tracks. New limit is % 262,144 tracks; reduce -timesize to increase % in proportion. % - ht_store now throws an error when the ID % space fills up. % % v0.77 2013-05-06 - small change to behavior on -remove: if the % last few items in -list are all empty, the % list is truncated to the last non-deleted item. % % v0.76 2013-04-24 - added -nojenkins flag and options to ht_ % calls to support hash table without jenkins % hash (better for interpreting retrieved hashes). % % v0.75 2013-04-11 - added -removelist to specify a list of files % to remove contained in a text file. % % v0.74 2013-04-10 - audioread is now inside try/catch block so % any error e.g. from malformed soundfile % results in a warning, but does not stop the % program. % % v0.73 2013-03-13 - added -list 1 option to list files in database. % % v0.72 2013-02-01 - fixed file count so that it doesn't keep % resetting to zero with -addcheckpoint > 0 % - added version field to dbase so we can % check for incompatibilty in the future % (in ht_save and ht_load, and audfprint.m). % % v0.71 2012-12-22 - added "persistent" test for mpg123 in mp3read % to avoid bug where "which" command returns % wrong result % - added ht_repair, which checks consistency of % HashTableCounts and HashTable before saving (!). % % v0.7 2012-08-01 Added -adddir option for endless % directory-watching to add files too; probably % only makes sense with -matchdir, although it will % continue to save the database respecting % -addcheckpoint, and at exit-on-interrupt. Also, % added -add option for explicit sound files on % command line, and -add and -match now accept % mulitple files (but you can't use both at the % same time; it's a hack). % Also, fixed problem with -matchdir trying to % match '.' and '..' in Unix. % Also added -outdir to write per-query report files. % % v0.6 2012-05-29 Added -matchdir option for endless % directory-watching mode % % v0.5 2012-05-14 Added -remove option to remove single tracks. % % v0.4 2012-01-21 Now reports time of best match in 5th column of % output. Used to truncate at 20mins, now respects -maxdur. % % v0.3 2012-01-17 Added -matchmincount to exclude hits with few % matches, and -matchminprop to exclude hits much worse than % best. -nmatch renamed to -matchmaxret. % Added detail on report format above. % % v0.2 2012-01-09 Added -addcheckpoint to save partial versions of % database during long add operations. % % v0.1 2012-01-06 Some speed optimizations; enabled -overlap 1; % added -quiet and -addskip options % Biggest change was switch to 11025 Hz sampling % rate (from 8000); now runs about 4x faster for mp3s. % *NB* databases made with v0.0 cannot be used with v0.1. % % v0.0 2011-12-09 Initial release % Last updated: $Date: 2011/12/09 20:30:34 $ % Dan Ellis <firstname.lastname@example.org>