Publications of Leigh M. Smith

This is a list of my recent publications on music cognition, and musical rhythm representation.

Leigh Smith

CCRMA MIR Workshop notes



ISMIR2010logo.gif
Leigh M. Smith

Computational models of beat tracking of musical audio have been well explored, however, such systems often make "octave errors", identifying the beat period at double or half the beat rate than that actually recorded in the music. A method is described to detect if octave errors have occurred in beat tracking. Following an initial beat tracking estimation, a feature vector of metrical profile separated by spectral subbands is computed. A measure of subbeat quaver (1/8th note) alternation is used to compare half time and double time measures against the initial beat track estimation and indicate a likely octave error. This error estimate can then be used to re-estimate the beat rate. The performance of the approach is evaluated against the RWC database, showing successful identification of octave errors for an existing beat tracker. Using the octave error detector together with the existing beat tracking model improved beat tracking by reducing octave errors to 43% of the previous error rate.



ICMC 2010
Leigh M. Smith

A method for computing the similarity of metrical rhythmic patterns is described as applied to the audio signal of recorded music. For each rhythm, a combined feature vector of metrical profile and syncopation, separated by spectral subbands, hypermetrical profile, and tempo are compared. The descriptive capability of this feature vector is evaluated by it's use in a machine learning rhythm classification task, identifying ballroom dance styles using a support vector machine algorithm. Results indicate that with the full feature vector a result of 67% is achieved. This improves on previous results using rhythmic patterns alone, but does not exceed the best reported results. By evaluating individual features, measures of metrical, syncopation and hypermetrical profile are found to play a greater role than tempo in aiding discrimination.



Connection Science
Martin Coath, Susan Denham, Leigh M. Smith, Henkjan Honing, Amaury Hazan, Piotr Holonowicz, Hendrik Purwins

Connection Science, 21(2 & 3), 2009 pages 193-205)

We describe a biophysically motivated model of auditory salience based on a model of cortical responses and present results that show that the derived measure of salience can be used to identify the position of perceptual onsets in a musical stimulus successfully. The salience measure is also shown to be useful to track beats and predict rhythmic structure in the stimulus on the basis of its periodicity patterns. We evaluate the method using a corpus of unaccompanied freely sung stimuli and show that the method performs well, in some cases better than state-of-the-art algorithms. These results deserve attention because they are derived from a general model of auditory processing and not an arbitrary model achieving best performance in onset detection or beat-tracking tasks.



RPPW09.jpg
Leigh M. Smith

A computational multi-resolution model of musical rhythm expectation has been recently proposed based on cumulative evidence of rhythmic time-frequency ridges (Smith & Honing 2008a). This model was shown to demonstrate the emergence of musical meter from a bottom-up data processing model, thus clarifying the role of top-down expectation. Such a multiresolution time-frequency model of rhythm has also been previously demonstrated to track musical rubato well, with both synthesised (Smith & Honing 2008b) and performed audio examples (Coath et. al 2009). The model is evaluated for it's capability to generate accurate expectation from human musical performances. The musical performances consist of 63 monophonic rhythms from MIDI keyboard performances, and 50 audio recordings of popular music. The model generates expectations as forward predictions of times of future notes, a confidence weighting of the expectation, and a precision region. Evaluation consisted of generating successive expectations from an expanding fragment of the rhythm. In the case of the monophonic MIDI rhythms, these expectations were then scored by comparison against the onset times of notes actually then performed. The evaluation is repeated across each rhythm. In the case of the audio recording data, where beat annotations exist, but individual note onsets are not annotated, forward expectation is measured against the beat period. Scores were computed using information retrieval measures of precision, recall and F-score (van Rijsbergen 1979) for each performance. Preliminary results show mean PRF scores of (0.297, 0.370, 0.326) for the MIDI performances, indicating performance well above chance (0.177, 0.219, 0.195), but well below perfection. A model of expectation of musical rhythm has been shown to be computable. This can be used as a measure of rhythmic complexity, by measuring the degree of contradiction to expectation. As such, a rhythmic complexity measure is then applicable in models of rhythmic similarity used in music information retrieval applications.



Journal of Mathematics and Music
Leigh M. Smith and Henkjan Honing

Journal of Mathematics and Music, 2(2), 2008 pages 81-97

A method is described that exhaustively represents the periodicities created by a musical rhythm. The continuous wavelet transform is used to decompose an interval representation of a musical rhythm into a hierarchy of short-term frequencies. This reveals the temporal relationships between events over multiple time-scales, including metrical structure and expressive timing. The analytical method is demonstrated on a number of typical rhythmic examples. It is shown to make explicit periodicities in musical rhythm that correspond to cognitively salient “rhythmic strata” such as the tactus. Rubato, including accelerations and retards, are represented as temporal modulations of single rhythmic figures, instead of timing noise. These time varying frequency components are termed ridges in the time-frequency plane. The continuous wavelet transform is a general invertible transform and does not exclusively represent rhythmic signals alone. This clarifies the distinction between what perceptual mechanisms a pulse tracker must model, compared to what information any pulse induction process is capable of revealing directly from the signal representation of the rhythm. A pulse tracker is consequently modelled as a selection process, choosing the most salient time-frequency ridges to use as the tactus. This set of selected ridges are then used to compute an accompaniment rhythm by inverting the wavelet transform of a modified magnitude and original phase back to the time domain.



ICMPC 2008
Leigh M. Smith and Henkjan Honing

We describe a computational model of rhythmic cognition that predicts expected onset times. A dynamic representation of musical rhythm, the multiresolution analysis using the continuous wavelet transform is used. This representation decomposes the temporal structure of a musical rhythm into time varying frequency components in the rhythmic frequency range (sample rate of 200Hz). Both expressive timing and temporal structure (score times) contribute in an integrated fashion to determine the temporal expectancies. Future expected times are computed using peaks in the accumulation of time-frequency ridges. This accumulation at the edge of the analysed time window forms a dynamic expectancy. We evaluate this model using data sets of expressively timed (or performed) and generated musical rhythms, by its ability to produce expectancy profiles which correspond to metrical profiles. The results show that rhythms of two different meters are able to be distinguished. Such a representation indicates that a bottom-up, data-oriented process (or a non-cognitive model) is able to reveal durations which match metrical structure from realistic musical examples. This then helps to clarify the role of schematic expectancy (top-down) and it's contribution to the formation of musical expectation.



ICoMCS 2007
Leigh M. Smith and Henkjan Honing

A dynamic representation of musical rhythm, the multiresolution analysis using the continuous wavelet transform (CWT), is evaluated using a dataset of the interonset intervals of 105 national anthem rhythms. This representation decomposes the temporal structure of a musical rhythm into time varying frequency components in the rhythmic frequency range (sample rate of 200Hz). Evidence is presented that the beat (typically quarter-note or crochet) and the bar (measure) durations of each rhythm are revealed by this transform. Such evidence suggests that the pattern of time intervals, when analyzed with the CWT, function as features that are used in the process of forming a metrical interpretation. Since the CWT is an invertible transform of the interonset intervals in each rhythm, this result is interpreted as setting a minimum capability of discrimination that any perceptual model of beat or meter can achieve. It indicates that a bottom-up, data-oriented process (or a non-cognitive model) is able to reveal durations which match metrical structure from realistic musical examples. This then characterises the data and behaviour of a top-down cognitive model which must interact with the bottom-up process.