Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Algorithmic Composition of Popular Music
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-4957-2128
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.ORCID iD: 0000-0003-2926-6518
2012 (English)In: Proceedings of the 12th International Conference on Music Perception and Cognition and the 8th Triennial Conference of the European Society for the Cognitive Sciences of Music / [ed] Emilios Cambouropoulos, Costas Tsourgas, Panayotis Mavromatis, Costas Pastiadis, 2012, p. 276-285Conference paper, Published paper (Refereed)
Abstract [en]

Human  composers  have  used  formal  rules  for  centuries  to  compose music, and an algorithmic composer – composing without the aid of human intervention – can be seen as an extension of this technique. An algorithmic  composer  of  popular  music  (a  computer  program)  has been  created  with  the  aim  to  get  a  better  understanding  of  how  the composition process can be formalized and at the same time to get a better  understanding  of  popular  music  in  general.  With  the  aid  of statistical  findings  a  theoretical  framework  for  relevant  methods  are presented.  The concept of Global Joint Accent Structure is introduced, as a way of understanding how melody and rhythm interact to help the listener   form   expectations  about   future   events. Methods  of  the program   are   presented   with   references   to   supporting   statistical findings. The  algorithmic  composer  creates a  rhythmic  foundation (drums), a chord progression, a phrase structure and at last the melody. The main focus has been the composition of the melody. The melodic generation  is  based  on  ten  different  musical  aspects  which  are described. The resulting output was evaluated in a formal listening test where 14  computer  compositions  were  compared  with  21  human compositions. Results indicate a slightly lower score for the computer compositions but the differences were statistically insignificant.

Place, publisher, year, edition, pages
2012. p. 276-285
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-109400OAI: oai:DiVA.org:kth-109400DiVA, id: diva2:581688
Conference
the 12th International Conference on Music Perception and Cognition and the 8th Triennial Conference of the European Society for the Cognitive Sciences of Music
Note

QC 20130523

Available from: 2013-01-02 Created: 2013-01-02 Last updated: 2018-04-27Bibliographically approved
In thesis
1. Modeling Music: Studies of Music Transcription, Music Perception and Music Production
Open this publication in new window or tab >>Modeling Music: Studies of Music Transcription, Music Perception and Music Production
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This dissertation presents ten studies focusing on three important subfields of music information retrieval (MIR): music transcription (Part A), music perception (Part B), and music production (Part C).

In Part A, systems capable of transcribing rhythm and polyphonic pitch are described. The first two publications present methods for tempo estimation and beat tracking. A method is developed for computing the most salient periodicity (the “cepstroid”), and the computed cepstroid is used to guide the machine learning processing. The polyphonic pitch tracking system uses novel pitch-invariant and tone-shift-invariant processing techniques. Furthermore, the neural flux is introduced – a latent feature for onset and offset detection. The transcription systems use a layered learning technique with separate intermediate networks of varying depth.  Important music concepts are used as intermediate targets to create a processing chain with high generalization. State-of-the-art performance is reported for all tasks.

Part B is devoted to perceptual features of music, which can be used as intermediate targets or as parameters for exploring fundamental music perception mechanisms. Systems are proposed that can predict the perceived speed and performed dynamics of an audio file with high accuracy, using the average ratings from around 20 listeners as ground truths. In Part C, aspects related to music production are explored. The first paper analyzes long-term average spectrum (LTAS) in popular music. A compact equation is derived to describe the mean LTAS of a large dataset, and the variation is visualized. Further analysis shows that the level of the percussion is an important factor for LTAS. The second paper examines songwriting and composition through the development of an algorithmic composer of popular music. Various factors relevant for writing good compositions are encoded, and a listening test employed that shows the validity of the proposed methods.

The dissertation is concluded by Part D - Looking Back and Ahead, which acts as a discussion and provides a road-map for future work. The first paper discusses the deep layered learning (DLL) technique, outlining concepts and pointing out a direction for future MIR implementations. It is suggested that DLL can help generalization by enforcing the validity of intermediate representations, and by letting the inferred representations establish disentangled structures supporting high-level invariant processing. The second paper proposes an architecture for tempo-invariant processing of rhythm with convolutional neural networks. Log-frequency representations of rhythm-related activations are suggested at the main stage of processing. Methods relying on magnitude, relative phase, and raw phase information are described for a wide variety of rhythm processing tasks.

Abstract [sv]

Denna avhandling presenterar tio studier inom tre viktiga delområden av forskningsområdet ”Music Information Retrieval” (MIR) – ett forskningsområde fokuserat på att extrahera information från musik. Del A riktar in sig på musiktranskription, del B på musikperception och del C på musikproduktion. En avslutande del diskuterar maskininlärningsmetodiken och spanar framåt (del D).

I del A presenteras system som kan transkribera musik med hänsyn till rytm och polyfon tonhöjd. De två första publikationerna beskriver metoder för att estimera tempo och positionen av taktslag i ljudande musik. En metod för att beräkna den mest framstående periodiciteten (”cepstroiden”) beskrivs, samt hur denna kan användas för att guida de applicerade maskininlärningssystemen.  Systemet för polyfon tonhöjdsestimering kan både identifiera ljudande toner samt notstarter- och slut. Detta system är både tonhöjdsinvariant samt invariant med hänseende till variationer över tid inom ljudande toner. Transkriptionssystemen tränas till att predicera flera musikaspekter i en hierarkisk struktur. Transkriptionsresultaten är de bästa som rapporterats i tester på flera olika dataset.

Del B fokuserar på perceptuella särdrag i musik. Dessa kan prediceras för att modellera fundamentala perceptionsaspekter, men de kan också användas som representationer i modeller som försöker klassificera övergripande musikparametrar. Modeller presenteras som kan predicera den upplevda hastigheten samt den upplevda dynamiken i utförandet med hög precision. Medelvärdesbildade skattningar från omkring 20 lyssnare utgör målvärden under träning och evaluering.

I del C utforskas aspekter relaterade till musikproduktion. Den första studien analyserar variationer i medelvärdesspektrum mellan populärmusikaliska musikstycken. Analysen visar att nivån på perkussiva instrument är en viktig faktor för spektrumdistributionen – data antyder att denna nivå är bättre att använda än genreklassificeringar för att förutsäga spektrum. Den andra studien i del C behandlar musikkomposition. Ett algoritmiskt kompositionsprogram presenteras, där relevanta musikparametrar fogas samman en hierarkisk struktur. Ett lyssnartest genomförs för att påvisa validiteten i programmet och undersöka effekten av vissa parametrar.

Avhandlingen avslutas med del D, vilken placerar den utvecklade maskininlärningstekniken i ett vidare sammanhang och föreslår nya metoder för att generalisera rytmprediktion. Den första studien diskuterar djupinlärningssystem som predicerar olika musikaspekter i en hierarkisk struktur. Relevanta koncept presenteras tillsammans med förslag för framtida implementationer. Den andra studien föreslår en tempoinvariant metod för att processa log-frekvensdomänen av rytmsignaler med så kallade convolutional neural networks. Den föreslagna arkitekturen kan använda sig av magnitud, relative fas mellan rytmkanaler, samt ursprunglig fas från frekvenstransformen för att ta sig an flera viktiga problem relaterade till rytm.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2018. p. 49
Series
TRITA-EECS-AVL ; 2018-35
Keyword
Music Information Retrieval, MIR, Music, Music Transcription, Music Perception, Music Production, Tempo Estimation, Beat Tracking, Polyphonic Pitch Tracking, Polyphonic Transcription, Music Speed, Music Dynamics, Long-time average spectrum, LTAS, Algorithmic Composition, Deep Layered Learning, Convolutional Neural Networks, Rhythm Tracking, Ensemble Learning, Perceptual Features, Representation Learning
National Category
Other Computer and Information Science Computer Engineering Media and Communication Technology
Identifiers
urn:nbn:se:kth:diva-226894 (URN)978-91-7729-768-0 (ISBN)
Public defence
2018-05-18, D3, Kungliga Tekniska Högskolan, Lindstedtsvägen 5, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20180427

Available from: 2018-04-27 Created: 2018-04-26 Last updated: 2018-05-03Bibliographically approved

Open Access in DiVA

ElowssonFriberg-AlgorithmicCompositionofPopularMusic(1047 kB)0 downloads
File information
File name FULLTEXT01.pdfFile size 1047 kBChecksum SHA-512
c6b30eb6f13df3e51f78340440186464e69ae1352fea104296f492c1d028f5d5418b4fdf05b8ae94e34f3c9027cbac30fd5e2ee7e8dbc22a073ff1b1410ad2ff
Type fulltextMimetype application/pdf

Authority records BETA

Friberg, Anders

Search in DiVA

By author/editor
Elowsson, AndersFriberg, Anders
By organisation
Speech, Music and Hearing, TMHMusic Acoustics
Computer SciencesLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 301 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf