kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient Music Thumbnailing for Genre Classification
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2022 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Effektiv urvalsteknik för musikgenreklassificering (Swedish)
Abstract [en]

For music genre classification purposes, the importance of an intelligent and content-based selection of audio samples has been mostly overlooked. One common approach toward representative results is to select samples at predetermined locations. This is done to avoid analysis of the full audio during classification. While methods in music thumbnailing could be used to find representative samples for genre classification, it has not yet been demonstrated.

This thesis showed that efficient and genre representative sampling can be performed with a machine learning model (bidirectional RNN with either LSTM or GRU cells). The model was trained using a sub-optimal genre classifier and computationally inexpensive audio features. The genre classifier was used to compute losses for evenly spaced samples in 14000 tracks. The losses were then used as targets during training. Root mean square energy and zero-crossing rate were used as features, computed over relatively large time steps and wide intervals. The proposed framework can be used to give better predictions with trained genre classifiers and most likely also train, or retrain, them for higher classification accuracy at a low computational cost.

Abstract [sv]

Vid musikgenreklassificering har betydelsen av ett intelligent och innehållsbaserat urval allt som oftast förbisetts. En ansats till ett representativt resultat görs vanligtvis genom att ett antal kortare utdrag tas vid förutbestämda tidpunkter. Detta görs för att under en klassificering undvika att analysera hela musikverket. Fastän det existerar metoder inom music thumbnailing för att hitta representativa urval har de ännu inte tillämpats inom genreklassificering.

I denna uppsats visades att ett effektivt och genrerepresentativt musikurval kan utföras med en maskininlärningsmodell (dubbelriktad RNN med antingen LSTM- eller GRU-celler). Modellen tränades med hjälp av en suboptimal genreklassificerare och beräkningsmässigt enkla ljudattribut. Genreklassificeraren användes för att beräkna förlusten av jämnt fördelade urval i 14000 musikverk. Förlusterna användes sedan som utdata under träningen. Kvadratiskt energimedelvärde och zero-crossing rate beräknades över relativt långa tidssteg och breda intervall och användes som indata. Det föreslagna ramverket kan till beräkningsmässigt låga kostnader användas för att ge bättre förutsägelser med redan tränade genreklassificerare och sannolikt träna, eller omträna, dessa för högre noggrannhet vid klassificering.

Place, publisher, year, edition, pages
2022. , p. 56
Series
TRITA-SCI-GRU ; 2022:319
Keywords [en]
Music thumbnailing, Music genre classification, Machine learning, Deep learning, Bidirectional recurrent neural network, RNN
Keywords [sv]
Musikgenreklassificering, Maskininlärning, Djupinlärning, RNN
National Category
Other Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-322599OAI: oai:DiVA.org:kth-322599DiVA, id: diva2:1721352
External cooperation
Soundtrack Your Brand AB
Subject / course
Mathematics
Educational program
Master of Science - Applied and Computational Mathematics
Supervisors
Examiners
Available from: 2023-02-02 Created: 2022-12-21 Last updated: 2023-02-02Bibliographically approved

Open Access in DiVA

fulltext(926 kB)234 downloads
File information
File name FULLTEXT01.pdfFile size 926 kBChecksum SHA-512
792c60433e20389fe61d024826c713dcca009ae7b3ce7c9dfead3029062b05c198e7b07241252b44b81a61d85e4906aeabe2a412c27babc3182cd91abb1ecf04
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Other Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 234 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 359 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf