Modeling the perception of tempo
2015 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 137, no 6, 3163-3177 p.Article in journal (Refereed) Published
A system is proposed in which rhythmic representations are used to model the perception of tempo in music. The system can be understood as a five-layered model, where representations are transformed into higher-level abstractions in each layer. First, source separation is applied (Audio Level), onsets are detected (Onset Level), and interonset relationships are analyzed (Interonset Level). Then, several high-level representations of rhythm are computed (Rhythm Level). The periodicity of the music is modeled by the cepstroid vector-the periodicity of an interonset interval (IOI)-histogram. The pulse strength for plausible beat length candidates is defined by computing the magnitudes in different IOI histograms. The speed of the music is modeled as a continuous function on the basis of the idea that such a function corresponds to the underlying perceptual phenomena, and it seems to effectively reduce octave errors. By combining the rhythmic representations in a logistic regression framework, the tempo of the music is finally computed (Tempo Level). The results are the highest reported in a formal benchmarking test (2006-2013), with a P-Score of 0.857. Furthermore, the highest results so far are reported for two widely adopted test sets, with an Acc1 of 77.3% and 93.0% for the Songs and Ballroom datasets.
Place, publisher, year, edition, pages
2015. Vol. 137, no 6, 3163-3177 p.
Fluid Mechanics and Acoustics
IdentifiersURN: urn:nbn:se:kth:diva-171154DOI: 10.1121/1.4919306ISI: 000356622400033PubMedID: 26093407ScopusID: 2-s2.0-84934898408OAI: oai:DiVA.org:kth-171154DiVA: diva2:842401
Qc 201507202015-07-202015-07-202015-07-20Bibliographically approved