kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 65) Show all publications
Kanhov, E., Kaila, A.-K. & Sturm, B. L. T. (2025). Innovation, data colonialism and ethics: critical reflections on the impacts of AI on Irish traditional music. Journal of New Music Research, 1-17
Open this publication in new window or tab >>Innovation, data colonialism and ethics: critical reflections on the impacts of AI on Irish traditional music
2025 (English)In: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, p. 1-17Article in journal (Refereed) Published
Abstract [en]

By definition, traditional music is in a constant state of friction with innovation, exemplified by resistance to ‘outside’ influences such as different instruments, different ways of learning, and forces of commercialisation. An emerging external influence is artificial intelligence (AI), which is now capable of synthesising music collections at scales dwarfing those crafted by people and communities. In this paper, we examine the impact of research and development of AI on Irish traditional music through case studies of two generative AI systems: folk-rnn and Suno. How can researchers and engineers (academic or industrial) who develop and apply AI to specific practices of music make meaningful and non-harmful contributions to those practices? To answer this question, we critically reflect on the tensions that arise between tradition and innovation, how Irish traditional music becomes subject to data colonialism, and the interdisciplinary challenges of ethically engaging as researchers with a traditional music community. We ask what perspectives are needed to balance the interests of academic research and value systems in traditional music communities, and provide three ways forward for computer science to deepen the considerations of their impacts on communities of practice.

Place, publisher, year, edition, pages
Informa UK Limited, 2025
Keywords
artificial intelligence, innovation, Irish traditional music, ethnography, research methodology, ethics
National Category
Musicology
Identifiers
urn:nbn:se:kth:diva-359387 (URN)10.1080/09298215.2024.2442359 (DOI)001408665500001 ()2-s2.0-85216682924 (Scopus ID)
Funder
EU, European Research Council, 864189Wallenberg AI, Autonomous Systems and Software Program (WASP), 2020.0102
Note

QC 20250214

Available from: 2025-01-30 Created: 2025-01-30 Last updated: 2025-02-14Bibliographically approved
Cros Vila, L. & Sturm, B. (2025). (Mis)Communicating with our AI Systems. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems: . Paper presented at CHI 2025: CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April - 1 May, 2025. New York, NY, USA: Association for Computing Machinery (ACM), Article ID 416.
Open this publication in new window or tab >>(Mis)Communicating with our AI Systems
2025 (English)In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, New York, NY, USA: Association for Computing Machinery (ACM) , 2025, article id 416Conference paper, Published paper (Refereed)
Abstract [en]

Explainable Artificial Intelligence (XAI) is a discipline concerned with understanding predictions of AI systems. What is ultimately desired from XAI methods is for an AI system to link its input and output in a way that is interpretable with reference to the environment in which it is applied. A variety of methods have been proposed, but we argue in this paper that what has yet to be considered is miscommunication: the failure to convey and/or interpret an explanation accurately. XAI can be seen as a communication process and thus looking at how humans explain things to each other can provide guidance to its application and evaluation. We motivate a specific model of communication to help identify essential components of the process, and show the critical importance for establishing common ground, i.e., shared mutual knowledge, beliefs, and assumptions of the participants communicating.

Place, publisher, year, edition, pages
New York, NY, USA: Association for Computing Machinery (ACM), 2025
Keywords
Communication, Miscommunication, Dialog, Mutual-Understanding, Conversation, Explanation, Explainability, Explainable AI
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363375 (URN)10.1145/3706598.3713771 (DOI)
Conference
CHI 2025: CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April - 1 May, 2025
Note

Part of ISBN 9798400713941

QC 20250515

Available from: 2025-05-15 Created: 2025-05-15 Last updated: 2025-05-15Bibliographically approved
Cros Vila, L., Sturm, B., Casini, L. & Dalmazzo, D. (2025). The AI Music Arms Race: On the Detection of AI-Generated Music. Transactions of the International Society for Music Information Retrieval, 8(1), 179-194
Open this publication in new window or tab >>The AI Music Arms Race: On the Detection of AI-Generated Music
2025 (English)In: Transactions of the International Society for Music Information Retrieval, E-ISSN 2514-3298, Vol. 8, no 1, p. 179-194Article in journal (Refereed) Published
Abstract [en]

Several companies now offer platforms for users to create music at unprecedented scales by textual prompting. As the quality of this music rises, concern grows about how to differentiate AI-generated music from human-made music, with implications for content identification, copyright enforcement, and music recommendation systems. This article explores the detection of AI-generated music by assembling and studying a large dataset of music audio recordings (30,000 full tracks totaling 1,770 h, 33 m, and 31 s in duration), of which 10,000 are from the Million Song Dataset (Bertin-Mahieux et al., 2011) and 20,000 are generated and released by users of two popular AI music platforms: Suno and Udio. We build and evaluate several AI music detectors operating on Contrastive Language Audio Pretraining embeddings of the music audio, then compare them to a commercial baseline system as well as an open-source one. We applied various audio transformations to see their impacts on detector performance and found that the commercial baseline system is easily fooled by simply resampling audio to 22.05 kHz. We argue that careful consideration needs to be given to the experimental design underlying work in this area, as well as the very definition of "AI music." We release all our code at https://github.com/lcrosvila/ai-music-detection.

Place, publisher, year, edition, pages
Ubiquity Press, Ltd., 2025
Keywords
AI music detection, AI music, Generative AI, Suno, Udio
National Category
Music Computer and Information Sciences Humanities and the Arts Musicology Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-365668 (URN)10.5334/tismir.254 (DOI)
Projects
MUSAiC
Note

QC 20250702

Available from: 2025-06-25 Created: 2025-06-25 Last updated: 2025-07-02Bibliographically approved
Green, O., Sturm, B., Born, G. & Wald-Fuhrmann, M. (2024). A Critical Survey of Research in Music Genre Recognition. In: Proc. International Society for Music Information Retrieval Conference: . Paper presented at 25th International Society for Music Information Retrieval (ISMIR), San Francisco, CA, USA, Nov 10-14 2024. ISMIR
Open this publication in new window or tab >>A Critical Survey of Research in Music Genre Recognition
2024 (English)In: Proc. International Society for Music Information Retrieval Conference, ISMIR , 2024Conference paper, Published paper (Refereed)
Abstract [en]

This paper surveys 560 publications about music genre recognition (MGR) published between 2013–2022, com- plementing the comprehensive survey of [474], which cov- ered the time frame 1995–2012 (467 publications). For each publication we determine its main functions: a review of research, a contribution to evaluation methodology, or an experimental work. For each experimental work we note the data, experimental approach, and figure of merit it ap- plies. We also note the extents to which any publication engages with work critical of MGR as a research problem, as well as genre theory. Our bibliographic analysis shows for MGR research: 1) it typically does not meaningfully engage with any critique of itself; and 2) it typically does not meaningfully engage with work in genre theory. 

Place, publisher, year, edition, pages
ISMIR, 2024
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-356226 (URN)
Conference
25th International Society for Music Information Retrieval (ISMIR), San Francisco, CA, USA, Nov 10-14 2024
Funder
EU, Horizon 2020, 864189
Note

QC 20241115

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2024-11-15Bibliographically approved
Green, O., Sturm, B., Born, G. & Wald-Fuhrmann, M. (2024). A critical survey of research in music genre recognition. In: Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024: . Paper presented at 25th International Society for Music Information Retrieval (ISMIR), San Francisco, CA, USA, November 10-14, 2024 (pp. 745-782). International Society for Music Information Retrieval
Open this publication in new window or tab >>A critical survey of research in music genre recognition
2024 (English)In: Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024, International Society for Music Information Retrieval , 2024, p. 745-782Conference paper, Published paper (Other academic)
Abstract [en]

This paper surveys 560 publications about music genre recognition (MGR) published between 2013–2022, complementing the comprehensive survey of [474], which covered the time frame 1995–2012 (467 publications). For each publication we determine its main functions: a review of research, a contribution to evaluation methodology, or an experimental work. For each experimental work we note the data, experimental approach, and figure of merit it applies. We also note the extents to which any publication engages with work critical of MGR as a research problem, as well as genre theory. Our bibliographic analysis shows for MGR research: 1) it typically does not meaningfully engage with any critique of itself; and 2) it typically does not meaningfully engage with work in genre theory.

Place, publisher, year, edition, pages
International Society for Music Information Retrieval, 2024
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-361141 (URN)10.5281/zenodo.14877445 (DOI)2-s2.0-85219144298 (Scopus ID)
Conference
25th International Society for Music Information Retrieval (ISMIR), San Francisco, CA, USA, November 10-14, 2024
Note

Part of ISBN 978-1-7327299-4-0

QC 20250312

Available from: 2025-03-12 Created: 2025-03-12 Last updated: 2025-03-13Bibliographically approved
Sturm, B. (2024). A DIFFICULT CHRISTMAS.
Open this publication in new window or tab >>A DIFFICULT CHRISTMAS
2024 (English)Artistic output (Unrefereed)
National Category
Music
Identifiers
urn:nbn:se:kth:diva-357968 (URN)
Funder
EU, Horizon 2020, 864189
Available from: 2024-12-21 Created: 2024-12-21 Last updated: 2025-02-21Bibliographically approved
Kaila, A.-K. & Sturm, B. (2024). Agonistic Dialogue on the Value and Impact of AI Music Applications. In: Proceedings of the 2024 International Conference on AI and Musical Creativity: . Paper presented at 2024 International Conference on AI and Musical Creativity, 9 - 11 September, The University of Oxford, UK. Oxford, UK
Open this publication in new window or tab >>Agonistic Dialogue on the Value and Impact of AI Music Applications
2024 (English)In: Proceedings of the 2024 International Conference on AI and Musical Creativity, Oxford, UK, 2024Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we use critical and agonistic modes of inquiry to analyse and critique a specific application of AI to music practice. It records a structured interdisciplinary dialogue between 1) a musicologist and social scientist and 2) an engineer in music and computer science, focusing on folk-rnn and Irish Traditional Music (ITM) as a case study. We debate the role of data ethics in AI music applications, the dynamics of inclusion and exclusion, and the nature of embedded value systems and power asymmetries inherent in applying AI to music. We discuss how identifying the value of AI music applications is critical for ensuring research efforts make musical contributions along with academic and technical ones. Overall, this agonistic dialogue exemplifies how questions of right and wrong — the core of ethics — can be examined as AI is applied more and more to music practice.

Place, publisher, year, edition, pages
Oxford, UK: , 2024
Keywords
AI music, Irish Traditional Music, ethics, interdisciplinary, agonistic dialogue
National Category
Music
Research subject
Art, Technology and Design
Identifiers
urn:nbn:se:kth:diva-346695 (URN)10.5281/zenodo.15110169 (DOI)
Conference
2024 International Conference on AI and Musical Creativity, 9 - 11 September, The University of Oxford, UK
Funder
Marianne and Marcus Wallenberg Foundation, 2020.0102EU, Horizon 2020, 864189
Note

QC 20240523

Available from: 2024-05-22 Created: 2024-05-22 Last updated: 2025-04-24Bibliographically approved
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., . . . Ben-Tal, O. (2024). AI Music Studies: Preparing for the Coming Flood. In: Proceedings of AI Music Creativity: . Paper presented at AI Music Creativity, AIMC 2024, 9 - 11 September.
Open this publication in new window or tab >>AI Music Studies: Preparing for the Coming Flood
Show others...
2024 (English)In: Proceedings of AI Music Creativity, 2024Conference paper, Published paper (Refereed)
Abstract [en]

As music generated using artificial intelligence (AI music) becomes more prevalent — originating not only from individuals but also commercial services — the need to study it and its impacts becomes important. How can this material and its sources be meaningfully studied and critically engaged with, especially considering the unprecedented scales possible with generative AI? The paper begins to answer this question by considering AI music along seven aspects: 1) the company providing an AI music service; 2) its founders and employees; 3) the use of the service; 4) the users; 5) the algorithms; 6) the music; and 7) the sustainability. We make our discussion more concrete by considering the contemporary AI music service Boomy. While our investigations are preliminary and focused on a single AI music service, we argue that they open several interesting avenues of exploration for many disciplines and their intersections to help prepare for the coming flood of AI music. This paper asks many more questions than it answers, which is a feature (not a bug) of it advocating for a new domain of study: AI Music Studies.

National Category
Musicology
Identifiers
urn:nbn:se:kth:diva-356200 (URN)
Conference
AI Music Creativity, AIMC 2024, 9 - 11 September
Funder
EU, Horizon 2020, 864189
Note

QC 20241113

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2024-11-13Bibliographically approved
Thomé, C., Sturm, B., Pertoft, J. & Jonason, N. (2024). Applying textual inversion to control and personalize text-to-music models. In: Proc. 15th Int. Workshop on Machine Learning and Music: . Paper presented at Int. Workshop on Machine Learning and Music.
Open this publication in new window or tab >>Applying textual inversion to control and personalize text-to-music models
2024 (English)In: Proc. 15th Int. Workshop on Machine Learning and Music, 2024Conference paper, Published paper (Refereed)
Abstract [en]

A text-to-music (TTM) model should synthesize audio that reflects the concepts in a given prompt as long as it has been trained on those concepts. If a prompt references concepts that the TTM model has not been trained on then the audio it synthesizes will likely not match. This paper investigates the application of a simple gradient-based approach called textual inversion (TI) to expand the concept vocabulary of a trained TTM model without compromising the fidelity of concepts on which it has already been trained. We apply this technique to MusicGen and measure its reconstruction and editability quality, as well as its subjective quality. We see TI can expand the concept vocabulary of a pretrained TTM model, thus making it personalized and more controllable without having to finetune the entire model. 

National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-356224 (URN)
Conference
Int. Workshop on Machine Learning and Music
Funder
EU, Horizon 2020, 864189
Note

QC 20241113

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2024-11-13Bibliographically approved
Dalmazzo, D., Déguernel, K. & Sturm, B. (2024). ChromaFlow: Modeling And Generating Harmonic Progressions With a Transformer And Voicing Encoding. In: MML 2024: 15th International Workshop on Machine Learning and Music, 2024, Vilnius, Lithuania: . Paper presented at 15th International Workshop on Machine Learning and Music. ECML PKDD 2024, September 9, 2024. Vilnius, Lithuania
Open this publication in new window or tab >>ChromaFlow: Modeling And Generating Harmonic Progressions With a Transformer And Voicing Encoding
2024 (English)In: MML 2024: 15th International Workshop on Machine Learning and Music, 2024, Vilnius, Lithuania, Vilnius, Lithuania, 2024Conference paper, Published paper (Refereed)
Abstract [en]

Modeling harmonic progressions in symbolic music is a complex task that requires generating musically coherent and varied chord sequences. In this study, we employ a transformer-based architecture trained on a comprehensive dataset of 48,072 songs, which includes an augmented set of 4,300 original pieces from the iReal Pro application transposed across all chromatic keys. We introduce a novel tokenization and voicing encoding strategy designed to enhance the musicality of the generated chord progressions. Our approach not only generates chord progression suggestions but also provides corresponding voicings tailored for instruments such as piano and guitar. To evaluate the effectiveness of our model, we conducted a listening test comparing the harmonic progressions produced by our approach against those from a baseline model. The results indicate that our model generates progressions with more fluid voicings, coherent harmonic motion, and plausible chord suggestions, effectively utilizing repetition and variation to enhance musicality.

Place, publisher, year, edition, pages
Vilnius, Lithuania: , 2024
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-356209 (URN)
Conference
15th International Workshop on Machine Learning and Music. ECML PKDD 2024, September 9, 2024
Note

QC 20241115

Available from: 2024-11-11 Created: 2024-11-11 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2549-6367

Search in DiVA

Show all publications