Skip to Content

Music Cognition

warning: Creating default object from empty value in /home/leighsmi/public_html/modules/taxonomy/ on line 33.
Relating to music perception and cognition

BreakTweaker released using MediaMined Discover

iZotope today announced BreakTweaker a drum machine and synth DAW plugin incorporating MediaMined Discover for content based searching of its sample libraries. The video demo does not discuss the feature, but you'll see the "Discover" button on the Sample window at 1:40 in the video.

Probing neural mechanisms of music perception, cognition, and performance using multivariate decoding

Rebecca S. Schaefer, Shinichi Furuya, Leigh M. Smith, Blair Bohannan Kaneshiro and Petri Toiviainen

Psychomusicology: Music, Mind and Brain, 22(2):168–174, 2012


Recent neuroscience research has shown increasing use of multivariate decoding methods and machine learning. These methods, by uncovering the source and nature of informative variance in large data sets, invert the classical direction of inference that attempts to explain brain activity from mental state variables or stimulus features. However, these techniques are not yet commonly used among music researchers. In this position article, we introduce some key features of machine learning methods and review their use in the field of cognitive and behavioral neuroscience of music. We argue for the great potential of these methods in decoding multiple data types, specifically audio waveforms, electroen- cephalography, functional MRI, and motion capture data. By finding the most informative aspects of stimulus and performance data, hypotheses can be generated pertaining to how the brain processes incoming musical information and generates behavioral output, respectively. Importantly, these methods are also applicable to different neural and physiological data types such as magnetoencephalography, near-infrared spectroscopy, positron emission tomography, and electromyography.

Automated classification of music genre, sound objects, and speech by machine learning.

Leigh M. Smith, Stephen T. Pope, Jay Leboeuf and Steve Tjoa

Proceedings of the 12th International Conference on Music Perception and Cognition, page 943, Thessaloniki, Greece, July 2012. ICMPC/ESCOM. (abstract).


A software system, MediaMined, is described for the efficient analysis and classification of auditory signals. This system has been applied to the tasks of musical instrument identification, classifying musical genre, distinguishing between music and speech, and detection of the gender of human speakers. For each of these tasks, the same algorithm is applied, consisting of low-level signal analysis, statistical processing and perceptual modeling for feature extraction, and then supervised learning of sound classes. Given a ground truth dataset of audio examples, textual descriptive classification labels are then produced. Such labels are suitable for use in automating content interpretation (auditioning) and content retrieval, mixing and signal processing. A multidimensional feature vector is calculated from statistical and perceptual processing of low level signal analysis in the spectral and temporal domains. Machine learning techniques such as support vector machines are applied to produce classification labels given a selected taxonomy. The system is evaluated on large annotated ground truth datasets (n > 30000) and demonstrates success rates (F-measures) greater than 70% correct retrieval, depending on the task. Issues arising from labeling and balancing training sets are discussed. The performance of classification of audio using machine learning methods demonstrates the relative contribution of bottom-up signal derived features and data oriented classification processes to human cognition. Such demonstrations then sharpen the question as to the contribution of top-down, expectation based processes in human auditory cognition.

iZotope hiring MediaMined interns!

If you are a budding music or audio engineering undergraduate student, iZotope is hiring paid interns. The work mostly consists of auditioning our systems.

iZotope hiring MediaMined developers!

The MediaMined team at iZotope is expanding! We're hiring software developers with a background in web application development. If you are a LAMP rockstar and care deeply about music, we'd love to hear from you!

iZotope acquires Imagine Research

The company I have been working with for about 18 months now, Imagine Research has now been acquired by iZotope Inc. based in Boston. This is a great opportunity for us to continue to develop our MediaMined technology, to broaden it's reach and incorporate it into iZotope's future products. For my local friends, I'll continue to be based in NYC.


Mediamined is a project I've been working on together with the great folks at Imagine Research for about a year now. With some help from the U.S. National Science Foundation, we're now making public some of our technology.

Createasphere conference

A heads up that I will be on a panel at the Createasphere Digital Asset Management Conference entitled Is Your DAM REALLY Ready for Audio & Video Files? This will be a great opportunity to discuss some of the work that Imagine Research has been working on over the last year.

Beat Tracking References


Here are a set of references that I cite in my CCRMA MIR workshop presentation at Stanford University this week.

Leigh Smith

CCRMA MIR Workshop notes

Foot-tapping with Rubato

This is an example of automatic interpretation of an anapest rhythm undergoing extreme asymmetrical rubato (tempo variation). The foot-tapper plays a hi-hat sound along to a test anapestic rhythm (repeated short-short-long) which is being varied in it's tempo. The tapper has found the underlying repetition rate and selectively chosen to tap on the first beat of the groups of three, respecting (with a slight error) the rubato of the rhythm. This gives a robust means to interpret and synthesize ritards, accelerando, grooves and swing.

Syndicate content