Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using listener-based perceptual features as intermediate representations in music information retrieval
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.ORCID iD: 0000-0003-2926-6518
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics. Hanover University, Germany .
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
Show others and affiliations
2014 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 136, no 4, 1951-1963 p.Article in journal (Refereed) Published
Abstract [en]

The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a "brute force" model using a large number of general audio features.

Place, publisher, year, edition, pages
2014. Vol. 136, no 4, 1951-1963 p.
Keyword [en]
Communication, Performance, Emotion, Speech, Loudness, Timbre, Tempo, Model, Pitch, Mood
National Category
Computer Science Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-158173DOI: 10.1121/1.4892767ISI: 000345977400059PubMedID: 25324094Scopus ID: 2-s2.0-84907863477OAI: oai:DiVA.org:kth-158173DiVA: diva2:774969
Funder
Swedish Research Council, 2009-4285 2012-4685
Note

QC 20150108

Available from: 2014-12-30 Created: 2014-12-30 Last updated: 2017-12-05Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMedScopus

Authority records BETA

Friberg, Anders

Search in DiVA

By author/editor
Friberg, AndersSchoonderwaldt, ErwinHedblad, AntonFabiani, MarcoElowsson, Anders
By organisation
Music AcousticsSpeech, Music and Hearing, TMH
In the same journal
Journal of the Acoustical Society of America
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 61 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf