kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 10) Show all publications
Casini, L., Cros Vila, L., Dalmazzo, D., Kaila, A.-K. & Sturm, B. L. .. (2026). Data‑Driven Analysis of Text‑Conditioning in AI‑Generated Music: A Case Study with Suno and Udio. Transactions of the International Society for Music Information Retrieval, 9(1), 194-209
Open this publication in new window or tab >>Data‑Driven Analysis of Text‑Conditioning in AI‑Generated Music: A Case Study with Suno and Udio
Show others...
2026 (English)In: Transactions of the International Society for Music Information Retrieval, ISSN 2514-3298, Vol. 9, no 1, p. 194-209Article in journal (Refereed) Published
Abstract [en]

Online commercial artificial intelligence (AI) platforms for generating music from text prompts (AI music) are now being used by many users to create millions of music audio recordings daily. Some AI music is appearing in advertising, music playlists of restaurants and gyms, and even hit music charts, in many countries. How are users engaging with these text‑to‑music AI platforms, where text is a principal mode of interaction to specify prompts (e.g., free terms), lyrics (e.g., sung terms), and tags (e.g., high‑level stylistic terms)? What languages appear? What characterizes prompts, lyrics, and tags? How are mentions of real artists used? What kind of additional instructions (metatags) are used? To address these questions, we assemble and analyze a collection of 101, 953 songs generated from May to October 2024 by 60, 342 users of Suno and Udio. Using a combination of state‑of‑the‑art text‑embedding models, dimensionality reduction, and clustering methods, we analyze the prompts, tags, and lyrics and automatically annotate and display the processed data in interactive plots. Our results reveal prominent themes in lyrics, language preferences, and prompting strategies, as well as peculiar attempts at steering models through the use of metatags. We share our code and data resources to promote further musicological study of AI music.

Place, publisher, year, edition, pages
Ubiquity Press, Ltd., 2026
Keywords
AI music, generative AI, Suno, Udio, exploratory data analysis, natural language processing
National Category
Musicology Artificial Intelligence
Research subject
Media Technology; Computer Science
Identifiers
urn:nbn:se:kth:diva-381510 (URN)10.5334/tismir.273 (DOI)
Note

QC 20260522

Available from: 2026-05-18 Created: 2026-05-18 Last updated: 2026-05-22Bibliographically approved
Jonason, N., Casini, L. & Sturm, B. (2025). SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward. In: Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025): . Paper presented at 6th Conference on AI Music Creativity (AIMC 2025), Brussels, Belgium, September 10-12, 2025.
Open this publication in new window or tab >>SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward
2025 (English)In: Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025), 2025Conference paper, Published paper (Refereed)
Abstract [en]

Recent work has proposed training machine learning models to predict aesthetic ratings for music audio. Our work explores whether such models can be used to finetune a symbolic music generation system with reinforcement learning, and what effect this has on the system outputs. To test this, we use group relative policy optimization to finetune a piano MIDI model with Meta Audiobox Aesthetics ratings of audio-rendered outputs as the reward. We find that this optimization affects multiple low-level features of the generated outputs, and improves the average subjective ratings in a preliminary listening study with 14 participants. We also find that over-optimization dramatically reduces diversity of model outputs. Code and listening examples can be found here: https://github.com/erl-j/SMART.

Keywords
Artificial Intelligence, Music, Reinforcement Learning
National Category
Artificial Intelligence
Identifiers
urn:nbn:se:kth:diva-377799 (URN)10.5281/zenodo.16946387 (DOI)
Conference
6th Conference on AI Music Creativity (AIMC 2025), Brussels, Belgium, September 10-12, 2025
Funder
EU, Horizon 2020, 864189
Note

QC 20260306

Available from: 2026-03-05 Created: 2026-03-05 Last updated: 2026-03-06Bibliographically approved
Cros Vila, L., Sturm, B., Casini, L. & Dalmazzo, D. (2025). The AI Music Arms Race: On the Detection of AI-Generated Music. Transactions of the International Society for Music Information Retrieval, 8(1), 179-194
Open this publication in new window or tab >>The AI Music Arms Race: On the Detection of AI-Generated Music
2025 (English)In: Transactions of the International Society for Music Information Retrieval, E-ISSN 2514-3298, Vol. 8, no 1, p. 179-194Article in journal (Refereed) Published
Abstract [en]

Several companies now offer platforms for users to create music at unprecedented scales by textual prompting. As the quality of this music rises, concern grows about how to differentiate AI-generated music from human-made music, with implications for content identification, copyright enforcement, and music recommendation systems. This article explores the detection of AI-generated music by assembling and studying a large dataset of music audio recordings (30,000 full tracks totaling 1,770 h, 33 m, and 31 s in duration), of which 10,000 are from the Million Song Dataset (Bertin-Mahieux et al., 2011) and 20,000 are generated and released by users of two popular AI music platforms: Suno and Udio. We build and evaluate several AI music detectors operating on Contrastive Language Audio Pretraining embeddings of the music audio, then compare them to a commercial baseline system as well as an open-source one. We applied various audio transformations to see their impacts on detector performance and found that the commercial baseline system is easily fooled by simply resampling audio to 22.05 kHz. We argue that careful consideration needs to be given to the experimental design underlying work in this area, as well as the very definition of "AI music." We release all our code at https://github.com/lcrosvila/ai-music-detection.

Place, publisher, year, edition, pages
Ubiquity Press, Ltd., 2025
Keywords
AI music detection, AI music, Generative AI, Suno, Udio
National Category
Music Computer and Information Sciences Humanities and the Arts Musicology Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-365668 (URN)10.5334/tismir.254 (DOI)001528900900001 ()2-s2.0-105010342510 (Scopus ID)
Projects
MUSAiC
Note

QC 20250702

Available from: 2025-06-25 Created: 2025-06-25 Last updated: 2025-11-05Bibliographically approved
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., . . . Ben-Tal, O. (2024). AI Music Studies: Preparing for the Coming Flood. In: Proceedings of AI Music Creativity: . Paper presented at AI Music Creativity, AIMC 2024, 9 - 11 September.
Open this publication in new window or tab >>AI Music Studies: Preparing for the Coming Flood
Show others...
2024 (English)In: Proceedings of AI Music Creativity, 2024Conference paper, Published paper (Refereed)
Abstract [en]

As music generated using artificial intelligence (AI music) becomes more prevalent — originating not only from individuals but also commercial services — the need to study it and its impacts becomes important. How can this material and its sources be meaningfully studied and critically engaged with, especially considering the unprecedented scales possible with generative AI? The paper begins to answer this question by considering AI music along seven aspects: 1) the company providing an AI music service; 2) its founders and employees; 3) the use of the service; 4) the users; 5) the algorithms; 6) the music; and 7) the sustainability. We make our discussion more concrete by considering the contemporary AI music service Boomy. While our investigations are preliminary and focused on a single AI music service, we argue that they open several interesting avenues of exploration for many disciplines and their intersections to help prepare for the coming flood of AI music. This paper asks many more questions than it answers, which is a feature (not a bug) of it advocating for a new domain of study: AI Music Studies.

National Category
Musicology
Identifiers
urn:nbn:se:kth:diva-356200 (URN)
Conference
AI Music Creativity, AIMC 2024, 9 - 11 September
Funder
EU, Horizon 2020, 864189
Note

QC 20241113

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2024-11-13Bibliographically approved
Casini, L., Jonason, N. & Sturm, B. (2024). Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation. In: Johnson, C Rebelo, SM Santos, I (Ed.), ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024: . Paper presented at 13th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART) Held as Part of EvoStar Conference, APR 03-05, 2024, Aberystwyth, WALES (pp. 84-96). Springer Nature, 14633
Open this publication in new window or tab >>Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation
2024 (English)In: ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024 / [ed] Johnson, C Rebelo, SM Santos, I, Springer Nature , 2024, Vol. 14633, p. 84-96Conference paper, Published paper (Refereed)
Abstract [en]

The dominating approach for modeling sequences (e.g. text, music) with deep learning is the causal approach, which consists in learning to predict tokens sequentially given those preceding it. Another paradigm is masked language modeling, which consists of learning to predict the masked tokens of a sequence in no specific order, given all non-masked tokens. Both approaches can be used for generation, but the latter is more flexible for editing, e.g. changing the middle of a sequence. This paper investigates the viability of masked language modeling applied to Irish traditional music represented in the text-based format abc-notation. Our model, called abcMLM, enables a user to edit tunes in arbitrary ways while retaining similar generation capabilities to causal models. We find that generation using masked language modeling is more challenging, but leveraging additional information from a dataset, e.g., imputing musical structure, can generate sequences that are on par with previous models.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 14633
Keywords
Symbolic Music Generation, Masked Language Models, Irish Traditional Music
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-347151 (URN)10.1007/978-3-031-56992-0_6 (DOI)001212363900006 ()2-s2.0-85190687279 (Scopus ID)
Conference
13th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART) Held as Part of EvoStar Conference, APR 03-05, 2024, Aberystwyth, WALES
Note

QC 20240604

Part of ISBN 978-3-031-56991-3; 978-3-031-56992-0

Available from: 2024-06-04 Created: 2024-06-04 Last updated: 2025-02-07Bibliographically approved
Casini, L., Jonason, N. & Sturm, B. (2024). Sparks of Musical AGI? Challenges and perspectives in music co-creation with LLMs: A qualitative exploration of the music knowledge of LLMs and their use for music creation. In: : . Paper presented at International Conference on AI and Musical Creativity (AIMC) 2024, Oxford UK, 9 - 11 September 2024.
Open this publication in new window or tab >>Sparks of Musical AGI? Challenges and perspectives in music co-creation with LLMs: A qualitative exploration of the music knowledge of LLMs and their use for music creation
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In the paper Sparks of Artificial General Intelligence, the authors show how OpenAI’s GPT-4 is able do well in variety of tasks that be represented with text and claim it to have “a more general intelligence than previous AI models.” One of the tasks they explore is symbolic music generation. In this paper we critically analyze their results and extend the discourse around the capabilities of LLMs for music by exploring additional musical tasks and LLMs. Furthermore, we will investigate the viability of smaller models when used in conjunction with Retrieval Augmented Generation, as well as finetuning on programmatically written prompts using Quantized Low Rank Adapters. Finally, we discuss some critical aspects of LLMs as a tool for music generation.

Keywords
Large Language Models, Music Co-Creation, Music Understanding, Finetuning
National Category
Computer and Information Sciences Music
Identifiers
urn:nbn:se:kth:diva-352705 (URN)
Conference
International Conference on AI and Musical Creativity (AIMC) 2024, Oxford UK, 9 - 11 September 2024
Note

QC 20240906

Available from: 2024-09-05 Created: 2024-09-05 Last updated: 2025-02-21Bibliographically approved
Jonason, N., Casini, L. & Sturm, B. (2024). Steer-by-prior Editing of Symbolic Music Loops. In: Proceedings 24th Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024: . Paper presented at 24th Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, Vilnius, LTU, Sep 09 2024 - Sep 13 2024. Springer Nature
Open this publication in new window or tab >>Steer-by-prior Editing of Symbolic Music Loops
2024 (English)In: Proceedings 24th Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, Springer Nature , 2024Conference paper, Published paper (Refereed)
Abstract [en]

With the goal of building a system capable of controllablesymbolic music loop generation and editing, this paper explores a gen-eralisation of Masked Language Modelling we call Superposed LanguageModelling. Rather than input tokens being known or unknown, a Super-posed Language Model takes priors over the sequence as input, enablingus to apply various constraints to the generation at inference time. Afterdetailing our approach, we demonstrate our model across various editingtasks in the domain of multi-instrument MIDI loops. We end by high-lighting some limitations of the approach and avenues for future work.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
machine learning, music, masked language models, superposed language models, constraints, MIDI
National Category
Natural Language Processing
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363494 (URN)10.1007/978-3-032-25305-7_29 (DOI)2-s2.0-105040258148 (Scopus ID)
Conference
24th Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, Vilnius, LTU, Sep 09 2024 - Sep 13 2024
Projects
EU, Horizon 2020, 864189
Funder
EU, European Research Council, 864189
Note

Part of ISBN 9783032253040

QC 20250520

Available from: 2025-05-16 Created: 2025-05-16 Last updated: 2026-06-17Bibliographically approved
Sturm, B., Amerotti, M., Dalmazzo, D., Cros Vila, L., Casini, L. & Kanhov, E. (2024). Stochastic Pirate Radio (KSPR): Generative AI applied to simulate commercial radio. In: Proc. AI Music Creativity: . Paper presented at AI Music Creativity, AIMC 2024, 9 - 11 September.
Open this publication in new window or tab >>Stochastic Pirate Radio (KSPR): Generative AI applied to simulate commercial radio
Show others...
2024 (English)In: Proc. AI Music Creativity, 2024Conference paper, Published paper (Refereed)
Abstract [en]

This paper (a product of artistic research) engages with the following challenge: combine publicly available generative AI tools to simulate a commercial radio station, complete with dialogue, news and advertisements, and music programming. Our five success criteria for the “station” are: 1) it runs autonomously; 2) it features diverse content; 3) its content is generated and assembled in faster than real-time; 4) it sounds like commercial radio; and 5) it is engaging for longer than its novelty factor. We consider a variety of generative AI systems for text and dialogue, synthesizing expressive speech, and generating music audio. We describe our engineered pipeline and illustrate its components with several audio examples. We compare our results to other “endless” streams of content. Our resulting stream — “Stochastic Pirate Radio (KSPR)” — can be heard here: https://www.youtube.com/@KSPRStochasticPirateRadio.

National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Information and Communication Technology; Media Technology
Identifiers
urn:nbn:se:kth:diva-356225 (URN)
Conference
AI Music Creativity, AIMC 2024, 9 - 11 September
Funder
EU, Horizon 2020, 864189
Note

QC 20241113

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2025-01-31Bibliographically approved
Jonason, N., Casini, L., Thomé, C. & Sturm, B. (2023). Retrieval Augmented Generation of Symbolic Music with LLMs. In: Extended Abstracts for the Late-Breaking Demo Session of the 22nd Int. Society for Music Information Retrieval Conf.: . Paper presented at 22nd International Society for Music Information Retrieval Conference, Online, November 7-12, 2021.
Open this publication in new window or tab >>Retrieval Augmented Generation of Symbolic Music with LLMs
2023 (English)In: Extended Abstracts for the Late-Breaking Demo Session of the 22nd Int. Society for Music Information Retrieval Conf., 2023Conference paper, Poster (with or without abstract) (Other academic)
Abstract [en]

We explore the use of large language models (LLMs) for music generation using a retrieval system to select relevant few-shot examples. We find promising initial results for music generation in a dialogue with the user, especially considering the ease with which such a system can be implemented. The code is available online.

Keywords
Music, Artificial Intelligence, Symbolic music generation, Abc notation, Large Language Models
National Category
Artificial Intelligence
Identifiers
urn:nbn:se:kth:diva-377800 (URN)
Conference
22nd International Society for Music Information Retrieval Conference, Online, November 7-12, 2021
Funder
EU, Horizon 2020, 864189
Note

QC 20260306

Available from: 2026-03-05 Created: 2026-03-05 Last updated: 2026-03-06Bibliographically approved
Casini, L. & Sturm, B. (2022). Tradformer: A Transformer Model of Traditional Music Transcriptions. In: Proceedings 31st International Joint Conference on Artificial Intelligence, IJCAI 2022: . Paper presented at International Joint Conference on Artificial Intelligence IJCAI 2022, Vienna, Austria, 23-29 July 2022 (pp. 4915-4920). , Article ID AR46.
Open this publication in new window or tab >>Tradformer: A Transformer Model of Traditional Music Transcriptions
2022 (English)In: Proceedings 31st International Joint Conference on Artificial Intelligence, IJCAI 2022, 2022, p. 4915-4920, article id AR46Conference paper, Published paper (Refereed)
Abstract [en]

We explore the transformer neural network architecture for modeling music, specifically Irish and Swedish traditional dance music.Given the repetitive structures of these kinds of music, the transformer should be as successful with fewer parameters and complexity as the hitherto most successful model, a vanilla long short-term memory network.We find that achieving good performance with the transformer is not straightforward,and careful consideration is needed for the sampling strategy, evaluating intermediate outputs in relation to engineering choices, and finally analyzing what the model learns.We discuss these points with several illustrations, providing reusable insights for engineering other music generation systems. We also report the high performance of our final transformer model in a competition of music generation systems focused on a type of Swedish dance.

Keywords
artificial intelligence, music
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-312662 (URN)10.24963/ijcai.2022/681 (DOI)2-s2.0-85137891439 (Scopus ID)
Conference
International Joint Conference on Artificial Intelligence IJCAI 2022, Vienna, Austria, 23-29 July 2022
Funder
EU, Horizon 2020, 864189
Note

QC 20220530

Available from: 2022-05-20 Created: 2022-05-20 Last updated: 2025-05-27Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-3468-6974

Search in DiVA

Show all publications