kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 68) Show all publications
Grouwels, J., Jonason, N. & Sturm, B. (2025). Exploring the Expressive Space of an Articulatory Vocal Modal using Quality-Diversity Optimization with Multimodal Embeddings. In: GECCO 2025 - Proceedings of the 2025 Genetic and Evolutionary Computation Conference: . Paper presented at 2025 Genetic and Evolutionary Computation Conference, GECCO 2025, Malaga, Spain, Jul 14 2025 - Jul 18 2025 (pp. 1362-1370). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Exploring the Expressive Space of an Articulatory Vocal Modal using Quality-Diversity Optimization with Multimodal Embeddings
2025 (English)In: GECCO 2025 - Proceedings of the 2025 Genetic and Evolutionary Computation Conference, Association for Computing Machinery (ACM) , 2025, p. 1362-1370Conference paper, Published paper (Refereed)
Abstract [en]

Knowing which sounds can be produced by a simulated vocal model and how they are connected to its articulatory behavior is not trivial. Being able to map this out can be interesting for applications that make use of the extended capabilities of a voice, e.g., singing or vocal imitations. We present a method that achieves this for a state-of-the-art articulatory vocal model (VocalTractLab) by combining it with a recent Quality-Diversity algorithm (CMA-MAE) and audio embeddings obtained through a multi-modal pretrained model (CLAP). The text-capabilities of CLAP make it possible to steer the exploration through a text prompt. We show that the method explores more efficiently than a random sampling baseline, covering more of the measure space and achieving higher objective scores. We provide several listening examples and the source code for a scalable implementation.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
articulatory vocal model, CLAP, CMA-MAE, diversity optimization, multimodal, quality-diversity, text prompt, VocalTractLab
National Category
Natural Language Processing Computer Sciences Comparative Language Studies and Linguistics Signal Processing
Identifiers
urn:nbn:se:kth:diva-369365 (URN)10.1145/3712256.3726313 (DOI)2-s2.0-105013082602 (Scopus ID)
Conference
2025 Genetic and Evolutionary Computation Conference, GECCO 2025, Malaga, Spain, Jul 14 2025 - Jul 18 2025
Note

Part of ISBN 9798400714658

QC 20250903

Available from: 2025-09-03 Created: 2025-09-03 Last updated: 2025-09-03Bibliographically approved
Kanhov, E., Kaila, A.-K. & Sturm, B. L. T. (2025). Innovation, data colonialism and ethics: critical reflections on the impacts of AI on Irish traditional music. Journal of New Music Research, 1-17
Open this publication in new window or tab >>Innovation, data colonialism and ethics: critical reflections on the impacts of AI on Irish traditional music
2025 (English)In: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, p. 1-17Article in journal (Refereed) Published
Abstract [en]

By definition, traditional music is in a constant state of friction with innovation, exemplified by resistance to ‘outside’ influences such as different instruments, different ways of learning, and forces of commercialisation. An emerging external influence is artificial intelligence (AI), which is now capable of synthesising music collections at scales dwarfing those crafted by people and communities. In this paper, we examine the impact of research and development of AI on Irish traditional music through case studies of two generative AI systems: folk-rnn and Suno. How can researchers and engineers (academic or industrial) who develop and apply AI to specific practices of music make meaningful and non-harmful contributions to those practices? To answer this question, we critically reflect on the tensions that arise between tradition and innovation, how Irish traditional music becomes subject to data colonialism, and the interdisciplinary challenges of ethically engaging as researchers with a traditional music community. We ask what perspectives are needed to balance the interests of academic research and value systems in traditional music communities, and provide three ways forward for computer science to deepen the considerations of their impacts on communities of practice.

Place, publisher, year, edition, pages
Informa UK Limited, 2025
Keywords
artificial intelligence, innovation, Irish traditional music, ethnography, research methodology, ethics
National Category
Musicology
Identifiers
urn:nbn:se:kth:diva-359387 (URN)10.1080/09298215.2024.2442359 (DOI)001408665500001 ()2-s2.0-85216682924 (Scopus ID)
Funder
EU, European Research Council, 864189Wallenberg AI, Autonomous Systems and Software Program (WASP), 2020.0102
Note

QC 20250214

Available from: 2025-01-30 Created: 2025-01-30 Last updated: 2025-02-14Bibliographically approved
Cros Vila, L. & Sturm, B. (2025). (Mis)Communicating with our AI Systems. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems: . Paper presented at CHI 2025: CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April - 1 May, 2025. New York, NY, USA: Association for Computing Machinery (ACM), Article ID 416.
Open this publication in new window or tab >>(Mis)Communicating with our AI Systems
2025 (English)In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, New York, NY, USA: Association for Computing Machinery (ACM) , 2025, article id 416Conference paper, Published paper (Refereed)
Abstract [en]

Explainable Artificial Intelligence (XAI) is a discipline concerned with understanding predictions of AI systems. What is ultimately desired from XAI methods is for an AI system to link its input and output in a way that is interpretable with reference to the environment in which it is applied. A variety of methods have been proposed, but we argue in this paper that what has yet to be considered is miscommunication: the failure to convey and/or interpret an explanation accurately. XAI can be seen as a communication process and thus looking at how humans explain things to each other can provide guidance to its application and evaluation. We motivate a specific model of communication to help identify essential components of the process, and show the critical importance for establishing common ground, i.e., shared mutual knowledge, beliefs, and assumptions of the participants communicating.

Place, publisher, year, edition, pages
New York, NY, USA: Association for Computing Machinery (ACM), 2025
Keywords
Communication, Miscommunication, Dialog, Mutual-Understanding, Conversation, Explanation, Explainability, Explainable AI
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363375 (URN)10.1145/3706598.3713771 (DOI)
Conference
CHI 2025: CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April - 1 May, 2025
Note

Part of ISBN 9798400713941

QC 20250515

Available from: 2025-05-15 Created: 2025-05-15 Last updated: 2025-11-05Bibliographically approved
Cros Vila, L., Sturm, B., Casini, L. & Dalmazzo, D. (2025). The AI Music Arms Race: On the Detection of AI-Generated Music. Transactions of the International Society for Music Information Retrieval, 8(1), 179-194
Open this publication in new window or tab >>The AI Music Arms Race: On the Detection of AI-Generated Music
2025 (English)In: Transactions of the International Society for Music Information Retrieval, E-ISSN 2514-3298, Vol. 8, no 1, p. 179-194Article in journal (Refereed) Published
Abstract [en]

Several companies now offer platforms for users to create music at unprecedented scales by textual prompting. As the quality of this music rises, concern grows about how to differentiate AI-generated music from human-made music, with implications for content identification, copyright enforcement, and music recommendation systems. This article explores the detection of AI-generated music by assembling and studying a large dataset of music audio recordings (30,000 full tracks totaling 1,770 h, 33 m, and 31 s in duration), of which 10,000 are from the Million Song Dataset (Bertin-Mahieux et al., 2011) and 20,000 are generated and released by users of two popular AI music platforms: Suno and Udio. We build and evaluate several AI music detectors operating on Contrastive Language Audio Pretraining embeddings of the music audio, then compare them to a commercial baseline system as well as an open-source one. We applied various audio transformations to see their impacts on detector performance and found that the commercial baseline system is easily fooled by simply resampling audio to 22.05 kHz. We argue that careful consideration needs to be given to the experimental design underlying work in this area, as well as the very definition of "AI music." We release all our code at https://github.com/lcrosvila/ai-music-detection.

Place, publisher, year, edition, pages
Ubiquity Press, Ltd., 2025
Keywords
AI music detection, AI music, Generative AI, Suno, Udio
National Category
Music Computer and Information Sciences Humanities and the Arts Musicology Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-365668 (URN)10.5334/tismir.254 (DOI)001528900900001 ()2-s2.0-105010342510 (Scopus ID)
Projects
MUSAiC
Note

QC 20250702

Available from: 2025-06-25 Created: 2025-06-25 Last updated: 2025-11-05Bibliographically approved
Amerotti, M., Benford, S., Sturm, B. & Avila, J. M. (2025). The Virtual Session: Synchronizing Multiple Virtual Musicians Simulating an Irish Traditional Music Session. In: Proc. International Computer Music Conference: . Paper presented at International Computer Music Conference.
Open this publication in new window or tab >>The Virtual Session: Synchronizing Multiple Virtual Musicians Simulating an Irish Traditional Music Session
2025 (English)In: Proc. International Computer Music Conference, 2025Conference paper, Published paper (Refereed)
Abstract [en]

Our previous work on modeling Irish traditional dance

music performance focused on modeling a single player

interacting with a human musician. We now create a vir-

tual session simulation where multiple virtual musicians

play together interactively, mimicking a common practice

within Irish traditional music. We devise (1) a tempo syn-

chronization system that allows the virtual musicians to

play together and (2) a leadership negotiation system en-

abling them to take different roles during the session. Fol-

lowing a practice-led approach, we explore emerging be-

haviors in the resulting sessions and discuss how this work

relates to and can impact traditional music practice.

National Category
Computer Systems
Research subject
Art, Technology and Design
Identifiers
urn:nbn:se:kth:diva-372915 (URN)978-1-951748-00-5 (ISBN)
Conference
International Computer Music Conference
Funder
EU, Horizon 2020, 864189
Available from: 2025-11-15 Created: 2025-11-15 Last updated: 2025-11-15
Green, O., Sturm, B., Born, G. & Wald-Fuhrmann, M. (2024). A Critical Survey of Research in Music Genre Recognition. In: Proc. International Society for Music Information Retrieval Conference: . Paper presented at 25th International Society for Music Information Retrieval (ISMIR), San Francisco, CA, USA, Nov 10-14 2024. ISMIR
Open this publication in new window or tab >>A Critical Survey of Research in Music Genre Recognition
2024 (English)In: Proc. International Society for Music Information Retrieval Conference, ISMIR , 2024Conference paper, Published paper (Refereed)
Abstract [en]

This paper surveys 560 publications about music genre recognition (MGR) published between 2013–2022, com- plementing the comprehensive survey of [474], which cov- ered the time frame 1995–2012 (467 publications). For each publication we determine its main functions: a review of research, a contribution to evaluation methodology, or an experimental work. For each experimental work we note the data, experimental approach, and figure of merit it ap- plies. We also note the extents to which any publication engages with work critical of MGR as a research problem, as well as genre theory. Our bibliographic analysis shows for MGR research: 1) it typically does not meaningfully engage with any critique of itself; and 2) it typically does not meaningfully engage with work in genre theory. 

Place, publisher, year, edition, pages
ISMIR, 2024
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-356226 (URN)
Conference
25th International Society for Music Information Retrieval (ISMIR), San Francisco, CA, USA, Nov 10-14 2024
Funder
EU, Horizon 2020, 864189
Note

QC 20241115

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2024-11-15Bibliographically approved
Green, O., Sturm, B., Born, G. & Wald-Fuhrmann, M. (2024). A critical survey of research in music genre recognition. In: Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024: . Paper presented at 25th International Society for Music Information Retrieval (ISMIR), San Francisco, CA, USA, November 10-14, 2024 (pp. 745-782). International Society for Music Information Retrieval
Open this publication in new window or tab >>A critical survey of research in music genre recognition
2024 (English)In: Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024, International Society for Music Information Retrieval , 2024, p. 745-782Conference paper, Published paper (Other academic)
Abstract [en]

This paper surveys 560 publications about music genre recognition (MGR) published between 2013–2022, complementing the comprehensive survey of [474], which covered the time frame 1995–2012 (467 publications). For each publication we determine its main functions: a review of research, a contribution to evaluation methodology, or an experimental work. For each experimental work we note the data, experimental approach, and figure of merit it applies. We also note the extents to which any publication engages with work critical of MGR as a research problem, as well as genre theory. Our bibliographic analysis shows for MGR research: 1) it typically does not meaningfully engage with any critique of itself; and 2) it typically does not meaningfully engage with work in genre theory.

Place, publisher, year, edition, pages
International Society for Music Information Retrieval, 2024
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-361141 (URN)10.5281/zenodo.14877445 (DOI)2-s2.0-85219144298 (Scopus ID)
Conference
25th International Society for Music Information Retrieval (ISMIR), San Francisco, CA, USA, November 10-14, 2024
Note

Part of ISBN 978-1-7327299-4-0

QC 20250312

Available from: 2025-03-12 Created: 2025-03-12 Last updated: 2025-03-13Bibliographically approved
Sturm, B. (2024). A DIFFICULT CHRISTMAS.
Open this publication in new window or tab >>A DIFFICULT CHRISTMAS
2024 (English)Artistic output (Unrefereed)
National Category
Music
Identifiers
urn:nbn:se:kth:diva-357968 (URN)
Funder
EU, Horizon 2020, 864189
Available from: 2024-12-21 Created: 2024-12-21 Last updated: 2025-02-21Bibliographically approved
Kaila, A.-K. & Sturm, B. (2024). Agonistic Dialogue on the Value and Impact of AI Music Applications. In: Proceedings of the 2024 International Conference on AI and Musical Creativity: . Paper presented at 2024 International Conference on AI and Musical Creativity, 9 - 11 September, The University of Oxford, UK. Oxford, UK
Open this publication in new window or tab >>Agonistic Dialogue on the Value and Impact of AI Music Applications
2024 (English)In: Proceedings of the 2024 International Conference on AI and Musical Creativity, Oxford, UK, 2024Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we use critical and agonistic modes of inquiry to analyse and critique a specific application of AI to music practice. It records a structured interdisciplinary dialogue between 1) a musicologist and social scientist and 2) an engineer in music and computer science, focusing on folk-rnn and Irish Traditional Music (ITM) as a case study. We debate the role of data ethics in AI music applications, the dynamics of inclusion and exclusion, and the nature of embedded value systems and power asymmetries inherent in applying AI to music. We discuss how identifying the value of AI music applications is critical for ensuring research efforts make musical contributions along with academic and technical ones. Overall, this agonistic dialogue exemplifies how questions of right and wrong — the core of ethics — can be examined as AI is applied more and more to music practice.

Place, publisher, year, edition, pages
Oxford, UK: , 2024
Keywords
AI music, Irish Traditional Music, ethics, interdisciplinary, agonistic dialogue
National Category
Music
Research subject
Art, Technology and Design
Identifiers
urn:nbn:se:kth:diva-346695 (URN)10.5281/zenodo.15110169 (DOI)
Conference
2024 International Conference on AI and Musical Creativity, 9 - 11 September, The University of Oxford, UK
Funder
Marianne and Marcus Wallenberg Foundation, 2020.0102EU, Horizon 2020, 864189
Note

QC 20240523

Available from: 2024-05-22 Created: 2024-05-22 Last updated: 2025-04-24Bibliographically approved
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., . . . Ben-Tal, O. (2024). AI Music Studies: Preparing for the Coming Flood. In: Proceedings of AI Music Creativity: . Paper presented at AI Music Creativity, AIMC 2024, 9 - 11 September.
Open this publication in new window or tab >>AI Music Studies: Preparing for the Coming Flood
Show others...
2024 (English)In: Proceedings of AI Music Creativity, 2024Conference paper, Published paper (Refereed)
Abstract [en]

As music generated using artificial intelligence (AI music) becomes more prevalent — originating not only from individuals but also commercial services — the need to study it and its impacts becomes important. How can this material and its sources be meaningfully studied and critically engaged with, especially considering the unprecedented scales possible with generative AI? The paper begins to answer this question by considering AI music along seven aspects: 1) the company providing an AI music service; 2) its founders and employees; 3) the use of the service; 4) the users; 5) the algorithms; 6) the music; and 7) the sustainability. We make our discussion more concrete by considering the contemporary AI music service Boomy. While our investigations are preliminary and focused on a single AI music service, we argue that they open several interesting avenues of exploration for many disciplines and their intersections to help prepare for the coming flood of AI music. This paper asks many more questions than it answers, which is a feature (not a bug) of it advocating for a new domain of study: AI Music Studies.

National Category
Musicology
Identifiers
urn:nbn:se:kth:diva-356200 (URN)
Conference
AI Music Creativity, AIMC 2024, 9 - 11 September
Funder
EU, Horizon 2020, 864189
Note

QC 20241113

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2024-11-13Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2549-6367

Search in DiVA

Show all publications