Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting the perception of performed dynamics in music audio with ensemble learning
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-4957-2128
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-2926-6518
2017 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 141, no 3, 2224-2242 p.Article in journal (Refereed) Published
Abstract [en]

By varying the dynamics in a musical performance, the musician can convey structure and different expressions. Spectral properties of most musical instruments change in a complex way with the performed dynamics, but dedicated audio features for modeling the parameter are lacking. In this study, feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux. Previously, ground truths ratings of performed dynamics had been collected by asking listeners to rate how soft/loud the musicians played in a set of audio files. The ratings, averaged over subjects, were used to train three different machine learning models, using the audio features developed for the study as input. The highest result was produced from an ensemble of multilayer perceptrons with an R2 of 0.84. This result seems to be close to the upper bound, given the estimated uncertainty of the ground truth data. The result is well above that of individual human listeners of the previous listening experiment, and on par with the performance achieved from the average rating of six listeners. Features were analyzed with a factorial design, which highlighted the importance of source separation in the feature extraction.

Place, publisher, year, edition, pages
Acoustical Society of America (ASA), 2017. Vol. 141, no 3, 2224-2242 p.
Keyword [en]
Performed dynamics, dynamics, music, timbre, ensemble learning, perceptual features
National Category
Media and Communication Technology
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-204657DOI: 10.1121/1.4978245ISI: 000398962500101Scopus ID: 2-s2.0-85016561050OAI: oai:DiVA.org:kth-204657DiVA: diva2:1085959
Funder
Swedish Research Council
Note

QC 20170406

Available from: 2017-03-30 Created: 2017-03-30 Last updated: 2017-04-28Bibliographically approved

Open Access in DiVA

The full text will be freely available from 2017-10-01 13:09
Available from 2017-10-01 13:09

Other links

Publisher's full textScopusJASA

Search in DiVA

By author/editor
Elowsson, AndersFriberg, Anders
By organisation
Speech, Music and Hearing, TMH
In the same journal
Journal of the Acoustical Society of America
Media and Communication Technology

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 36 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf