kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-3468-6974
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-2549-6367
2024 (English)In: ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024 / [ed] Johnson, C Rebelo, SM Santos, I, Springer Nature , 2024, Vol. 14633, p. 84-96Conference paper, Published paper (Refereed)
Abstract [en]

The dominating approach for modeling sequences (e.g. text, music) with deep learning is the causal approach, which consists in learning to predict tokens sequentially given those preceding it. Another paradigm is masked language modeling, which consists of learning to predict the masked tokens of a sequence in no specific order, given all non-masked tokens. Both approaches can be used for generation, but the latter is more flexible for editing, e.g. changing the middle of a sequence. This paper investigates the viability of masked language modeling applied to Irish traditional music represented in the text-based format abc-notation. Our model, called abcMLM, enables a user to edit tunes in arbitrary ways while retaining similar generation capabilities to causal models. We find that generation using masked language modeling is more challenging, but leveraging additional information from a dataset, e.g., imputing musical structure, can generate sequences that are on par with previous models.

Place, publisher, year, edition, pages
Springer Nature , 2024. Vol. 14633, p. 84-96
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 14633
Keywords [en]
Symbolic Music Generation, Masked Language Models, Irish Traditional Music
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-347151DOI: 10.1007/978-3-031-56992-0_6ISI: 001212363900006Scopus ID: 2-s2.0-85190687279OAI: oai:DiVA.org:kth-347151DiVA, id: diva2:1864840
Conference
13th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART) Held as Part of EvoStar Conference, APR 03-05, 2024, Aberystwyth, WALES
Note

QC 20240604

Part of ISBN 978-3-031-56991-3; 978-3-031-56992-0

Available from: 2024-06-04 Created: 2024-06-04 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Casini, LucaJonason, NicolasSturm, Bob

Search in DiVA

By author/editor
Casini, LucaJonason, NicolasSturm, Bob
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 70 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf