Music Cognition
iZotope hiring MediaMined interns!
If you are a budding music or audio engineering undergraduate student, iZotope is hiring paid interns. The work mostly consists of auditioning our systems.
iZotope hiring MediaMined developers!
The MediaMined team at iZotope is expanding! We're hiring software developers with a background in web application development. If you are a LAMP rockstar and care deeply about music, we'd love to hear from you!
iZotope acquires Imagine Research
The company I have been working with for about 18 months now, Imagine Research has now been acquired by iZotope Inc. based in Boston. This is a great opportunity for us to continue to develop our MediaMined technology, to broaden it's reach and incorporate it into iZotope's future products. For my local friends, I'll continue to be based in NYC.
Mediamined
Mediamined is a project I've been working on together with the great folks at Imagine Research for about a year now. With some help from the U.S. National Science Foundation, we're now making public some of our technology.
Createasphere conference
A heads up that I will be on a panel at the Createasphere Digital Asset Management Conference entitled Is Your DAM REALLY Ready for Audio & Video Files? This will be a great opportunity to discuss some of the work that Imagine Research has been working on over the last year.
Beat Tracking References
Here are a set of references that I cite in my CCRMA MIR workshop presentation at Stanford University this week.
CCRMA MIR Workshop notes
Foot-tapping with Rubato
This is an example of automatic interpretation of an anapest rhythm undergoing extreme asymmetrical rubato (tempo variation). The foot-tapper plays a hi-hat sound along to a test anapestic rhythm (repeated short-short-long) which is being varied in it's tempo. The tapper has found the underlying repetition rate and selectively chosen to tap on the first beat of the groups of three, respecting (with a slight error) the rubato of the rhythm. This gives a robust means to interpret and synthesize ritards, accelerando, grooves and swing.
Clapping to Auditory Salience Traces
The continuous wavelet transform (CWT) of Morlet and Grossman can also be applied to decompose a rhythm represented by a continuous trace of event "salience" derived directly from the audio signal. We use a measure of event salience developed by our EmCAP partners Prof. Sue Denham and Dr. Martin Coath at the University of Plymouth. The CWT decomposes the event salience trace into a hierarchy of periodicities (a multi-resolution representation). These periodicities have a limited duration in time (hence the term "wavelets"). Where those periodicities continue to be reinforced by the occurrence of each onset of the performed rhythm, a limited number of periodicities are continued over time, forming "ridges".
Beat Critic: Beat Tracking Octave Error Identification By Metrical Profile Analysis
Proceedings of the 11th International Symposium on Music Information Retrieval, Utrecht, The Netherlands, 2010, pages 99–104.
Computational models of beat tracking of musical audio have been well explored, however, such systems often make "octave errors", identifying the beat period at double or half the beat rate than that actually recorded in the music. A method is described to detect if octave errors have occurred in beat tracking. Following an initial beat tracking estimation, a feature vector of metrical profile separated by spectral subbands is computed. A measure of subbeat quaver (1/8th note) alternation is used to compare half time and double time measures against the initial beat track estimation and indicate a likely octave error. This error estimate can then be used to re-estimate the beat rate. The performance of the approach is evaluated against the RWC database, showing successful identification of octave errors for an existing beat tracker. Using the octave error detector together with the existing beat tracking model improved beat tracking by reducing octave errors to 43% of the previous error rate.
A Multiresolution Time-Frequency Analysis and Interpretation of Musical Rhythm
UWA PhD Thesis, 191 pages, October 2000, Department of Computer Science, University of Western Australia
Computational approaches to music have considerable problems in representing musical time. In particular, in representing structure over time spans longer than short motives. The new approach investigated here is to represent rhythm in terms of frequencies of events, explicitly representing the multiple time scales as spectral components of a rhythmic signal.
Approaches to multiresolution analysis are then reviewed. In comparison to Fourier theory, the theory behind wavelet transform analysis is described. Wavelet analysis can be used to decompose a time dependent signal onto basis functions which represent time-frequency components. The use of Morlet and Grossmann's wavelets produces the best simultaneous localisation in both time and frequency domains. These have the property of making explicit all characteristic frequency changes over time inherent in the signal.
An approach of considering and representing a musical rhythm in signal processing terms is then presented. This casts a musician's performance in terms of a conceived rhythmic signal. The actual rhythm performed is then a sampling of that complex signal, which listeners can reconstruct using temporal predictive strategies which are aided by familarity with the music or musical style by enculturation. The rhythmic signal is seen in terms of amplitude and frequency modulation, which can characterise forms of accents used by a musician.
Once the rhythm is reconsidered in terms of a signal, the application of wavelets in analysing examples of rhythm is then reported. Example rhythms exhibiting duration, agogic and intensity accents, accelerando and rallentando, rubato and grouping are analysed with Morlet wavelets. Wavelet analysis reveals short term periodic components within the rhythms that arise. The use of Morlet wavelets produces a "pure" theoretical decomposition. The degree to which this can be related to a human listener's perception of temporal levels is then considered.
The multiresolution analysis results are then applied to the well-known problem of foot-tapping to a performed rhythm. Using a correlation of frequency modulation ridges extracted using stationary phase, modulus maxima, dilation scale derivatives and local phase congruency, the tactus rate of the performed rhythm is identified, and from that, a new foot-tap rhythm is synthesised. This approach accounts for expressive timing and is demonstrated on rhythms exhibiting asymmetrical rubato and grouping. The accuracy of this approach is presented and assessed.
From these investigations, I argue the value of representing rhythm into time-frequency components. This is the explication of the notion of temporal levels (strata) and the ability to use analytical tools such as wavelets to produce formal measures of performed rhythms which match concepts from musicology and music cognition. This approach then forms the basis for further research in cognitive models of rhythm based on interpretation of the time-frequency components.


