kth.sePublications
Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Applying textual inversion to control and personalize text-to-music models
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-2549-6367
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0009-0003-8553-3542
2024 (English)In: Proc. 15th Int. Workshop on Machine Learning and Music, 2024Conference paper, Published paper (Refereed)
Abstract [en]

A text-to-music (TTM) model should synthesize audio that reflects the concepts in a given prompt as long as it has been trained on those concepts. If a prompt references concepts that the TTM model has not been trained on then the audio it synthesizes will likely not match. This paper investigates the application of a simple gradient-based approach called textual inversion (TI) to expand the concept vocabulary of a trained TTM model without compromising the fidelity of concepts on which it has already been trained. We apply this technique to MusicGen and measure its reconstruction and editability quality, as well as its subjective quality. We see TI can expand the concept vocabulary of a pretrained TTM model, thus making it personalized and more controllable without having to finetune the entire model. 

Place, publisher, year, edition, pages
2024.
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-356224OAI: oai:DiVA.org:kth-356224DiVA, id: diva2:1912384
Conference
Int. Workshop on Machine Learning and Music
Funder
EU, Horizon 2020, 864189
Note

QC 20241113

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2024-11-13Bibliographically approved

Open Access in DiVA

fulltext(490 kB)28 downloads
File information
File name FULLTEXT01.pdfFile size 490 kBChecksum SHA-512
714fb3d57afbc0820277b2a67fadac839477bf2ec033063e58534957d4eeb4ae6f270f7b97e3e57d0011008761fd3eab62fb907da70411ec7b635e55951adb60
Type fulltextMimetype application/pdf

Other links

Conference

Authority records

Thomé, CarlSturm, BobPertoft, JohnJonason, Nicolas

Search in DiVA

By author/editor
Thomé, CarlSturm, BobPertoft, JohnJonason, Nicolas
By organisation
Speech, Music and Hearing, TMH
Other Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 28 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 145 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf