Skip to Content

June 2010

Foot-tapping with Rubato

This is an example of automatic interpretation of an anapest rhythm undergoing extreme asymmetrical rubato (tempo variation). The foot-tapper plays a hi-hat sound along to a test anapestic rhythm (repeated short-short-long) which is being varied in it's tempo. The tapper has found the underlying repetition rate and selectively chosen to tap on the first beat of the groups of three, respecting (with a slight error) the rubato of the rhythm. This gives a robust means to interpret and synthesize ritards, accelerando, grooves and swing.

Clapping to Auditory Salience Traces

The continuous wavelet transform (CWT) of Morlet and Grossman can also be applied to decompose a rhythm represented by a continuous trace of event "salience" derived directly from the audio signal. We use a measure of event salience developed by our EmCAP partners Prof. Sue Denham and Dr. Martin Coath at the University of Plymouth. The CWT decomposes the event salience trace into a hierarchy of periodicities (a multi-resolution representation). These periodicities have a limited duration in time (hence the term "wavelets"). Where those periodicities continue to be reinforced by the occurrence of each onset of the performed rhythm, a limited number of periodicities are continued over time, forming "ridges".

Examples of Cognitive Musicology: Modelling Perception of Musical Rhythm & Expectation

Guest lecture for Utrecht University seminar on Technology in Musicology.

Multiresolution Representations of Musical Rhythm & Expectation

Guest lecture for Utrecht University Masters in Artificial Intelligence.

Music and Probability Lecture

Guest lecture for the Universiteit van Amsterdam course “Music Cognition”.



First version in SGML, removing the question mark in the filename which cause...

Commit by leighsmith :: r3665 /trunk/MusicKit/Documentation/Concepts/SpecialTopics/ (HowManyVoices.sgml HowManyVoices?.rtf): (link) First version in SGML, removing the question mark in the filename which cause grief on Windows SVN clients

Anti-bike rhetoric and FUD from NYT

In the New York Times piece reporting the planned introduction of rental electric cars in Paris, the piece opens with frightening tales of the number of the Velib rental bikes that have been damaged in the history of the operation of the system. This clearly is intended to communicate how much the system has failed.

Beat Critic: Beat Tracking Octave Error Identification By Metrical Profile Analysis

Leigh M. Smith

Computational models of beat tracking of musical audio have been well explored, however, such systems often make "octave errors", identifying the beat period at double or half the beat rate than that actually recorded in the music. A method is described to detect if octave errors have occurred in beat tracking. Following an initial beat tracking estimation, a feature vector of metrical profile separated by spectral subbands is computed. A measure of subbeat quaver (1/8th note) alternation is used to compare half time and double time measures against the initial beat track estimation and indicate a likely octave error. This error estimate can then be used to re-estimate the beat rate. The performance of the approach is evaluated against the RWC database, showing successful identification of octave errors for an existing beat tracker. Using the octave error detector together with the existing beat tracking model improved beat tracking by reducing octave errors to 43% of the previous error rate.

A Multiresolution Time-Frequency Analysis and Interpretation of Musical Rhythm

Leigh M. Smith

UWA PhD Thesis, 191 pages, October 2000, Department of Computer Science, University of Western Australia


Computational approaches to music have considerable problems in representing musical time. In particular, in representing structure over time spans longer than short motives. The new approach investigated here is to represent rhythm in terms of frequencies of events, explicitly representing the multiple time scales as spectral components of a rhythmic signal.

Approaches to multiresolution analysis are then reviewed. In comparison to Fourier theory, the theory behind wavelet transform analysis is described. Wavelet analysis can be used to decompose a time dependent signal onto basis functions which represent time-frequency components. The use of Morlet and Grossmann's wavelets produces the best simultaneous localisation in both time and frequency domains. These have the property of making explicit all characteristic frequency changes over time inherent in the signal.

An approach of considering and representing a musical rhythm in signal processing terms is then presented. This casts a musician's performance in terms of a conceived rhythmic signal. The actual rhythm performed is then a sampling of that complex signal, which listeners can reconstruct using temporal predictive strategies which are aided by familarity with the music or musical style by enculturation. The rhythmic signal is seen in terms of amplitude and frequency modulation, which can characterise forms of accents used by a musician.

Once the rhythm is reconsidered in terms of a signal, the application of wavelets in analysing examples of rhythm is then reported. Example rhythms exhibiting duration, agogic and intensity accents, accelerando and rallentando, rubato and grouping are analysed with Morlet wavelets. Wavelet analysis reveals short term periodic components within the rhythms that arise. The use of Morlet wavelets produces a "pure" theoretical decomposition. The degree to which this can be related to a human listener's perception of temporal levels is then considered.

The multiresolution analysis results are then applied to the well-known problem of foot-tapping to a performed rhythm. Using a correlation of frequency modulation ridges extracted using stationary phase, modulus maxima, dilation scale derivatives and local phase congruency, the tactus rate of the performed rhythm is identified, and from that, a new foot-tap rhythm is synthesised. This approach accounts for expressive timing and is demonstrated on rhythms exhibiting asymmetrical rubato and grouping. The accuracy of this approach is presented and assessed.

From these investigations, I argue the value of representing rhythm into time-frequency components. This is the explication of the notion of temporal levels (strata) and the ability to use analytical tools such as wavelets to produce formal measures of performed rhythms which match concepts from musicology and music cognition. This approach then forms the basis for further research in cognitive models of rhythm based on interpretation of the time-frequency components.