kth.sePublications
Change search
Refine search result
1 - 45 of 45
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Amerotti, Marco
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Benford, Steve
    University of Nottingham.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Vear, Craig
    University of Nottingham.
    A Live Performance Rule System Informed by Irish Traditional Dance Music2023In: Proc. International Symposium on Computer Music Multidisciplinary Research, 2023Conference paper (Refereed)
    Abstract [en]

    This paper describes ongoing work in programming a live performance system for interpreting melodies in ways that mimic Irish traditional dance music practice, and thatallows plug and play human interaction. Existing performance systemsare almost exclusively aimed at piano performance and classical music, and noneare aimed specifically at traditional music.We develop a rule-based approach using expert knowledgethat converts a melody into control parametersto synthesize an expressive MIDI performance,focusing on ornamentation, dynamics and subtle time deviation.Furthermore, we make the system controllable (e.g., via knobs or expression pedals) such that it can be controlled in real time by a musician.Our preliminary evaluations show the systemcan render expressive performances mimicking traditional practice, and allows for engaging withIrish traditional dance music in new ways. We provide several examples online.

    Download full text (pdf)
    fulltext
  • 2. Ben-Tal, Oded
    et al.
    Harris, Matthew
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    How music AI is useful: Engagements with composers, performers, and audiences2021In: Leonardo music journal, ISSN 0961-1215, E-ISSN 1531-4812, Vol. 54, no 5, p. 510-516Article in journal (Refereed)
    Abstract [en]

    Critical but often overlooked research questions in artificial intelligence (AI) applied to music involve the impact of the results for music. How and to what extent does such research contribute to the domain of music? How are the resulting models useful for music practitioners? In this article, we describe how we are addressing such questions by engaging composers, musicians, and audiences with our research. We first describe two websites we have created that make our AI models accessible to a wide audience. We then describe a professionally recorded album that we released to expert reviewers to gauge the plausibility of AI-generated material. Finally, we describe the use of our AI models as tools for co-creation. Evaluating AI research and music models in these ways illuminate their impact on music making in a range of styles and practices.

    Download full text (pdf)
    fulltext
  • 3.
    Ben-Tal, Oded
    et al.
    Kingston Univ, London, England..
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Quinton, Elio
    Universal Mus Grp, Santa Monica, CA USA..
    Simonnot, Josephine
    CNRS, UMR7186, CREM LESC, Nanterre, France..
    Helmlinger, Aurelie
    CNRS, UMR7186, CREM LESC, Nanterre, France..
    Finding Music in Music Data: A Summary of the DaCaRyH Project2019In: Computational phonogram archiving / [ed] Bader, R, Springer Nature , 2019, Vol. 5, p. 191-205Conference paper (Refereed)
    Abstract [en]

    The international research project, "Data science for the study of calypso-rhythm through history" (DaCaRyH), involved a collaboration between ethnomusicologists, computer scientists, and a composer. The primary aim of DaCaRyH was to explore how ethnomusicology could inform data science, and vice versa. Its secondary aim focused on creative applications of the results. This article summarises the results of the project, and more broadly discusses the benefits and challenges in such interdisciplinary research. It concludes with suggestions for reducing the barriers to similar work.

  • 4.
    Casini, Luca
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Jonason, Nicolas
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation2024In: ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024 / [ed] Johnson, C Rebelo, SM Santos, I, Springer Nature , 2024, Vol. 14633, p. 84-96Conference paper (Refereed)
    Abstract [en]

    The dominating approach for modeling sequences (e.g. text, music) with deep learning is the causal approach, which consists in learning to predict tokens sequentially given those preceding it. Another paradigm is masked language modeling, which consists of learning to predict the masked tokens of a sequence in no specific order, given all non-masked tokens. Both approaches can be used for generation, but the latter is more flexible for editing, e.g. changing the middle of a sequence. This paper investigates the viability of masked language modeling applied to Irish traditional music represented in the text-based format abc-notation. Our model, called abcMLM, enables a user to edit tunes in arbitrary ways while retaining similar generation capabilities to causal models. We find that generation using masked language modeling is more challenging, but leveraging additional information from a dataset, e.g., imputing musical structure, can generate sequences that are on par with previous models.

  • 5.
    Casini, Luca
    et al.
    University of Bologna.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Tradformer: A Transformer Model of Traditional Music Transcriptions2022In: Proceedings 31st International Joint Conference on Artificial Intelligence, IJCAI 2022, 2022, p. 4915-4920, article id AR46Conference paper (Refereed)
    Abstract [en]

    We explore the transformer neural network architecture for modeling music, specifically Irish and Swedish traditional dance music.Given the repetitive structures of these kinds of music, the transformer should be as successful with fewer parameters and complexity as the hitherto most successful model, a vanilla long short-term memory network.We find that achieving good performance with the transformer is not straightforward,and careful consideration is needed for the sampling strategy, evaluating intermediate outputs in relation to engineering choices, and finally analyzing what the model learns.We discuss these points with several illustrations, providing reusable insights for engineering other music generation systems. We also report the high performance of our final transformer model in a competition of music generation systems focused on a type of Swedish dance.

    Download full text (pdf)
    fulltext
  • 6. Chettri, B.
    et al.
    Stoller, D.
    Morfi, V.
    Martínez Ramírez, M. A.
    Benetos, E.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Ensemble models for spoofing detection in automatic speaker verification2019In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019, International Speech Communication Association, 2019, p. 1018-1022Conference paper (Refereed)
    Abstract [en]

    Detecting spoofing attempts of automatic speaker verification (ASV) systems is challenging, especially when using only one modelling approach. For robustness, we use both deep neural networks and traditional machine learning models and combine them as ensemble models through logistic regression. They are trained to detect logical access (LA) and physical access (PA) attacks on the dataset released as part of the ASV Spoofing and Countermeasures Challenge 2019. We propose dataset partitions that ensure different attack types are present during training and validation to improve system robustness. Our ensemble model outperforms all our single models and the baselines from the challenge for both attack types. We investigate why some models on the PA dataset strongly outperform others and find that spoofed recordings in the dataset tend to have longer silences at the end than genuine ones. By removing them, the PA task becomes much more challenging, with the tandem detection cost function (t-DCF) of our best single model rising from 0.1672 to 0.5018 and equal error rate (EER) increasing from 5.98% to 19.8% on the development set.

  • 7.
    Chettri, Bhusan
    et al.
    Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England..
    Benetos, Emmanouil
    Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England..
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Dataset Artefacts in Anti-Spoofing Systems: A Case Study on the ASVspoof 2017 Benchmark2020In: IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, ISSN 2329-9290, Vol. 28, p. 3018-3028Article in journal (Refereed)
    Abstract [en]

    The Automatic Speaker Verification Spoofing and Countermeasures Challenges motivate research in protecting speech biometric systems against a variety of different access attacks. The 2017 edition focused on replay spoofing attacks, and involved participants building and training systems on a provided dataset (ASVspoof 2017). More than 60 research papers have so far been published with this dataset, but none have sought to answer why countermeasures appear successful in detecting spoofing attacks. This article shows how artefacts inherent to the dataset may be contributing to the apparent success of published systems. We first inspect the ASVspoof 2017 dataset and summarize various artefacts present in the dataset. Second, we demonstrate how countermeasure models can exploit these artefacts to appear successful in this dataset. Third, for reliable and robust performance estimates on this dataset we propose discarding nonspeech segments and silence before and after the speech utterance during training and inference. We create speech start and endpoint annotations in the dataset and demonstrate how using them helps countermeasure models become less vulnerable from being manipulated using artefacts found in the dataset. Finally, we provide several new benchmark results for both frame-level and utterance-level models that can serve as new baselines on this dataset.

  • 8.
    Chettri, Bhusan
    et al.
    Queen Mary Univ London, Sch EECS, London, England..
    Mishra, Saumitra
    Queen Mary Univ London, Sch EECS, London, England..
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. Queen Mary Univ London, Sch EECS, London, England..
    Analysing the predictions of a CNN-based replay spoofing detection system2018In: 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), IEEE , 2018, p. 92-97Conference paper (Refereed)
    Abstract [en]

    Playing recorded speech samples of an enrolled speaker – “replay attack” – is a simple approach to bypass an automatic speaker ver- ification (ASV) system. The vulnerability of ASV systems to such attacks has been acknowledged and studied, but there has been no research into what spoofing detection systems are actually learning to discriminate. In this paper, we analyse the local behaviour of a replay spoofing detection system based on convolutional neural net- works (CNNs) adapted from a state-of-the-art CNN (LC N NF F T ) submitted at the ASVspoof 2017 challenge. We generate tempo- ral and spectral explanations for predictions of the model using the SLIME algorithm. Our findings suggest that in most instances of spoofing the model is using information in the first 400 milliseconds of each audio instance to make the class prediction. Knowledge of the characteristics that spoofing detection systems are exploiting can help build less vulnerable ASV systems, other spoofing detection systems, as well as better evaluation databases.

  • 9.
    Chettri, Bhusan
    et al.
    Queen Mary Univ London, Sch EECS, London, England..
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. Queen Mary Univ London, Sch EECS, London, England..
    Benetos, Emmanouil
    Queen Mary Univ London, Sch EECS, London, England..
    ANALYSING REPLAY SPOOFING COUNTERMEASURE PERFORMANCE UNDER VARIED CONDITIONS2018In: 2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) / [ed] Pustelnik, N Ma, Z Tan, ZH Larsen, J, IEEE , 2018Conference paper (Refereed)
    Abstract [en]

    In this paper, we aim to understand what makes replay spoofing detection difficult in the context of the ASVspoof 2017 corpus. We use FFT spectra, mel frequency cepstral coefficients (MFCC) and inverted MFCC (IMFCC) frontends and investigate different back-ends based on Convolutional Neural Networks (CNNs), Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs). On this database, we find that IMFCC frontend based systems show smaller equal error rate (EER) for high quality replay attacks but higher EER for low quality replay attacks in comparison to the baseline. However, we find that it is not straightforward to understand the influence of an acoustic environment (AE), a playback device (PD) and a recording device (RD) of a replay spoofing attack. One reason is the unavailability of metadata for genuine recordings. Second, it is difficult to account for the effects of the factors: AE, PD and RD, and their interactions. Finally, our frame-level analysis shows that the presence of cues (recording artefacts) in the first few frames of genuine signals (missing from replayed ones) influence class prediction.

  • 10.
    Cros Vila, Laura
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Statistical evaluation of abc-formatted music at the levels of items and corpora2023In: Proc. AI Music Creativity Conference, 2023Conference paper (Refereed)
    Abstract [en]

    This paper explores three distance measures and three statistical tests for the comparison of music expressed in abc format. We propose a methodology that allows for an analysis at the level of corpora (is the “style” represented in a corpus the same as that in the another corpus?) as well as at the level of item (is the “style” of an item that of the “style” represented in a corpus?). We estimate distributions of distances between item pairs within and between corpora, and test hypotheses that the distributions are identical. We empirically test the impact of distance measure and statistical test using a corpus of Irish traditional dance music and a collection of tunes generated by a machine learning model trained on the same. The proposed methodology has a variety of applications, from computational musicology, to evaluating machine generated music.

    Download full text (pdf)
    fulltext
  • 11.
    Dalmazzo, David
    et al.
    KTH.
    Deguernel, Ken
    Univ Lille, CNRS, UMR 9189, Cent Lille,CRIStAL, F-59000 Lille, France..
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    The Chordinator: Modeling Music Harmony by Implementing Transformer Networks and Token Strategies2024In: ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024 / [ed] Johnson, C Rebelo, SM Santos, I, Springer Nature , 2024, Vol. 14633, p. 52-66Conference paper (Refereed)
    Abstract [en]

    This paper compares two tokenization strategies for modeling chord progressions using the encoder transformer architecture trained with a large dataset of chord progressions in a variety of styles. The first strategy includes a tokenization method treating all different chords as unique elements, which results in a vocabulary of 5202 independent tokens. The second strategy expresses the chords as a dynamic tuple describing root, nature (e.g., major, minor, diminished, etc.), and extensions (e.g., additions or alterations), producing a specific vocabulary of 59 tokens related to chords and 75 tokens for style, bars, form, and format. In the second approach, MIDI embeddings are added into the positional embedding layer of the transformer architecture, with an array of eight values related to the notes forming the chords. We propose a trigram analysis addition to the dataset to compare the generated chord progressions with the training dataset, which reveals common progressions and the extent to which a sequence is duplicated. We analyze progressions generated by the models comparing HITS@k metrics and human evaluation of 10 participants, rating the plausibility of the progressions as potential music compositions from a musical perspective. The second model reported lower validation loss, better metrics, and more musical consistency in the suggested progressions.

  • 12.
    Déguernel, Ken
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. University of Lille.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Bias in Favour or Against Computational Creativity: A Survey and Reflection on the Importance of Socio-cultural Context in its Evaluation2023In: Proc. International Conference on Computational Creativity, 2023Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 13.
    Déguernel, Ken
    et al.
    University of Lille.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Maruri-Aguilar, Hugo
    Queen Mary University of London.
    Investigating the relationship between liking and belief in AI authorship in the context of Irish traditional music2022Conference paper (Refereed)
    Abstract [en]

    Past work has investigated the degree to which human listeners may be prejudiced against music knowing that it was created by artificial intelligence (AI). While these studies did not find a statistically significant relationship, the listening experiments were performed with music genres such as contemporary classical music or free jazz which are fairly welcoming of technology. In this work, we explore this prejudice in a context where strong opinions on authenticity and technology are typical: Irish traditional music (ITM). We conduct a listening experiment with practitioners of ITM asking each subject to first listen to a human performance of music generated by a computer in the style of ITM (this provenance is unknown to the listener), and then rate how much they like the piece. After rating all six pieces, each subject listens to each again but rates how likely they believe it is composed by a computer. The results of our pilot study suggest ITM practitioners tend to rate belief in AI authorship lower the more they rate liking a tune. 

    Download full text (pdf)
    fulltext
  • 14.
    Falk, Simon
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Ahlbäck, Sven
    DoReMIR Music Research AB.
    Automatic legato transcription based on onset detection2023In: SMC 2023: Proceedings of the Sound and Music Computing Conference 2023, Sound and Music Computing Network , 2023, p. 214-221Conference paper (Refereed)
    Abstract [en]

    This paper focuses on the transcription of performance expression and in particular, legato slurs for solo violin performance. This can be used to improve automatic music transcription and enrich the resulting notations with expression markings. We review past work in expression detection, and find that while legato detection has been explored its transcription has not. We propose a method for demarcating the beginning and ending of slurs in a performance by combining pitch and onset information produced by ScoreCloud (a music notation software with transcription capabilities) with articulated onsets detected by a convolutional neural network. To train this system, we build a dataset of solo bowed violin performance featuring three different musicians playing several exercises and tunes. We test the resulting method on a small collection of recordings of the same excerpt of music performed by five different musicians. We find that this signal-based method works well in cases where the acoustic conditions do not interfere largely with the onset strengths. Further work will explore data augmentation for making the articulation detection more robust, as well as an end-to-end solution. 

    Download full text (pdf)
    fulltext
  • 15.
    Hallström, Eric
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Mossmyr, Simon
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Vegeborn, Victor
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Wedin, Jonas
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    From Jigs and Reels to Schottisar och Polskor: Generating Scandinavian-like Folk Music with Deep Recurrent Networks2019In: Proceedings of the Sound and Music Computing Conferences, 2019Conference paper (Refereed)
    Abstract [en]

    The use of recurrent neural networks for modeling and generating music has been shown to be quite effective for compact, textual transcriptions of traditional music from Ireland and the UK. We explore how well these models perform for textual transcriptions of traditional music from Scandinavia. This type of music has characteristics that are similar to and different from that of Irish music, e.g., mode, rhythm, and structure. We investigate the effects of different architectures and training regimens, and evaluate the resulting models using three methods: a comparison of statistics between real and generated transcriptions, an appraisal of generated transcriptions via a semi-structured interview with an expert in Swedish folk music, and an ex- ercise conducted with students of Scandinavian folk music. We find that some of our models can generate new tran- scriptions sharing characteristics with Scandinavian folk music, but which often lack the simplicity of real transcrip- tions. One of our models has been implemented online at http://www.folkrnn.org for anyone to try.

    Download full text (pdf)
    fulltext
  • 16.
    Holzapfel, André
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Human Centered Technology, Media Technology and Interaction Design, MID.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Coeckelbergh, Mark
    Department of Philosophy, University of Vienna, Vienna, Austria.
    Ethical Dimensions of Music Information Retrieval Technology2018In: Transactions of the International Society for Music Information Retrieval, E-ISSN 2514-3298, Vol. 1, no 1, p. 44-55Article in journal (Refereed)
    Abstract [en]

    This article examines ethical dimensions of Music Information Retrieval (MIR) technology.  It uses practical ethics (especially computer ethics and engineering ethics) and socio-technical approaches to provide a theoretical basis that can inform discussions of ethics in MIR. To help ground the discussion, the article engages with concrete examples and discourse drawn from the MIR field. This article argues that MIR technology is not value-neutral but is influenced by design choices, and so has unintended and ethically relevant implications. These can be invisible unless one considers how the technology relates to wider society. The article points to the blurring of boundaries between music and technology, and frames music as “informationally enriched” and as a “total social fact.” The article calls attention to biases that are introduced by algorithms and data used for MIR technology, cultural issues related to copyright, and ethical problems in MIR as a scientific practice. The article concludes with tentative ethical guidelines for MIR developers, and calls for addressing key ethical problems with MIR technology and practice, especially those related to forms of bias and the remoteness of the technology development from end users.

    Download full text (pdf)
    fulltext
  • 17.
    Huang, Rujing
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Holzapfel, Andre
    KTH, School of Electrical Engineering and Computer Science (EECS), Human Centered Technology, Media Technology and Interaction Design, MID.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Kaila, Anna-Kaisa
    KTH, School of Electrical Engineering and Computer Science (EECS), Human Centered Technology, Media Technology and Interaction Design, MID.
    Beyond Diverse Datasets: Responsible MIR, Interdisciplinarity, and the Fractured Worlds of Music2023In: Transactions of the International Society for Music Information Retrieval, E-ISSN 2514-3298, Vol. 6, no 1, p. 43-59Article in journal (Refereed)
    Abstract [en]

    Musical worlds, not unlike our lived realities, are fundamentally fragmented and diverse, a fact often seen as a challenge or even a threat to the validity of research in Music Information Research (MIR). In this article, we propose to treat this characteristic of our musical universe(s) as an opportunity to fundamentally enrich and re-orient MIR. We propose that the time has arrived for MIR to reflect on its ethical and cultural turns (if they have been initiated at all) and take them a step further, with the goal of profoundly diversifying the discipline beyond the diversification of datasets. Such diversification, we argue, is likely to remain superficial if it is not accompanied by a simultaneous auto-critique of the discipline’s raison d’être. Indeed, this move to diversify touches on the philosophical underpinnings of what MIR is and should become as a field of research: What is music (ontology)? What are the nature and limits of knowledge concerning music (epistemology)? How do we obtain such knowledge (methodology)? And what about music and our own research endeavor do we consider “good” and “valuable” (axiology)? This path involves sincere inter- and intra-disciplinary struggles that underlie MIR, and we point to “agonistic interdisciplinarity” — that we have practiced ourselves via the writing of this article — as a future worth reaching for. The two featured case studies, about possible philosophical re-orientations in approaching ethics of music AI and about responsible engineering when AI meets traditional music, indicate a glimpse of what is possible.

    Download full text (pdf)
    fulltext
  • 18.
    Huang, Rujing Stacy
    et al.
    University of Hong Kong.
    Holzapfel, Andre
    KTH, School of Electrical Engineering and Computer Science (EECS), Human Centered Technology, Media Technology and Interaction Design, MID.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH, Music Acoustics.
    Global Ethics: From Philosophy to Practice A Culturally Informed Ethics of Music AI in Asia2022In: Artificial Intelligence and Music Ecosystem / [ed] Martin Clancy, Routledge, 2022, p. 126-141Chapter in book (Refereed)
    Download full text (pdf)
    fulltext
  • 19.
    Huang, Rujing
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Reframing “Aura”: Authenticity in the application of AI to Irish traditional music2021In: 2nd Conference on AI Music Creativity 2021, 2021Conference paper (Refereed)
    Abstract [en]

    Through a case study on the interaction between artificial intelligence(Ai) and Irish traditional music, we investigate contested issues of artistic agencyand the meaning of the “authentic” in a world of Ai-generated music. Weconsider musical authenticity from three perspectives: 1) the source/cause of art;2) the art itself; and 3) the recipient. Throughout, we adopt a posthumanistframework that ascribes agency to both human and non-human actors. Weinterpret authenticity as a relative, malleable concept and argue that thepartnership between Ai and folk music enriches each of these perspectives. Thispaper adds to the intensifying debate around the application, evaluation, ethics,and future of Ai-generated music.

    Download full text (pdf)
    fulltext
  • 20.
    Huang, Rujing
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Holzapfel, Andre
    KTH, School of Electrical Engineering and Computer Science (EECS), Human Centered Technology, Media Technology and Interaction Design, MID.
    De-centering the west: East asian philosophies and the ethics of applying artificial intelligence to music2021Conference paper (Refereed)
  • 21.
    Jonason, Nicolas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Neural music instrument cloning from few samples2022In: Proceedings of the 25th International Conference on Digital Audio Effects (DAFx20in22), 2022, Vol. 3, p. 296-303Conference paper (Refereed)
    Abstract [en]

    Neural music instrument cloning is an application of deep neural networks for imitating the timbre of a particular music instrument recording with a trained neural network. One can create suchclones using an approach such as DDSP, which has been shownto achieve good synthesis quality for several instrument types.However, this approach needs about ten minutes of audio datafrom the instrument of interest (target recording audio). In thiswork, we modify the DDSP architecture and apply transfer learning techniques used in speech voice cloning to significantlyreduce the amount of target recording audio required. We compare various cloning approaches and architectures across durationsof target recording audio, ranging from four to 256 seconds. Wedemonstrate editing of loudness and pitch as well as timbre transfer from only 16 seconds of target recording audio. Our code isavailable online1as well as many audio examples.

    Download full text (pdf)
    fulltext
  • 22.
    Jonason, Nicolas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Thomé, Carl
    Epidemic Sound.
    The control-synthesis approach for making expressive and controllable neural music synthesizers2020In: Proceedings of the 2020 AI Music Creativity Conference, 2020Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 23.
    Kaila, Anna-Kaisa
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Human Centered Technology, Media Technology and Interaction Design, MID.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Agonistic Dialogue on the Value and Impact of AI Music Applications2024In: Proceedings of the 2024 International Conference on AI and Musical Creativity, Oxford, UK, 2024Conference paper (Refereed)
    Abstract [en]

    In this paper, we use critical and agonistic modes of inquiry to analyse and critique a specific application of AI to music practice. It records a structured interdisciplinary dialogue between 1) a musicologist and social scientist and 2) an engineer in music and computer science, focusing on folk-rnn and Irish Traditional Music (ITM) as a case study. We debate the role of data ethics in AI music applications, the dynamics of inclusion and exclusion, and the nature of embedded value systems and power asymmetries inherent in applying AI to music. We discuss how identifying the value of AI music applications is critical for ensuring research efforts make musical contributions along with academic and technical ones. Overall, this agonistic dialogue exemplifies how questions of right and wrong — the core of ethics — can be examined as AI is applied more and more to music practice.

  • 24.
    Lousseief, Elias
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    MahlerNet: Unbounded Orchestral Music with Neural Networks2019In: Combined proceedings of the Nordic Sound and Music Computing Conference 2019 and the Interactive Sonification Workshop 2019, 2019, p. 57-63Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 25. Mishra, S.
    et al.
    Benetos, E.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Dixon, S.
    Reliable Local Explanations for Machine Listening2020In: 2020 International Joint Conference on Neural Networks (IJCNN), Institute of Electrical and Electronics Engineers Inc. , 2020, article id 92074442020Conference paper (Refereed)
    Abstract [en]

    One way to analyse the behaviour of machine learning models is through local explanations that highlight input features that maximally influence model predictions. Sensitivity analysis, which involves analysing the effect of input perturbations on model predictions, is one of the methods to generate local explanations. Meaningful input perturbations are essential for generating reliable explanations, but there exists limited work on what such perturbations are and how to perform them. This work investigates these questions in the context of machine listening models that analyse audio. Specifically, we use a state-of-the-art deep singing voice detection (SVD) model to analyse whether explanations from SoundLIME (a local explanation method) are sensitive to how the method perturbs model inputs. The results demonstrate that SoundLIME explanations are sensitive to the content in the occluded input regions. We further propose and demonstrate a novel method for quantitatively identifying suitable content type(s) for reliably occluding inputs of machine listening models. The results for the SVD model suggest that the average magnitude of input mel-spectrogram bins is the most suitable content type for temporal explanations.

  • 26.
    Mishra, Saumitra
    et al.
    Queen Mary University of London.
    Stoller, Daniel
    Queen Mary University of London.
    Benetos, Emmanouil
    Queen Mary University of London.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Dixon, Simon
    Queen Mary University of London.
    GAN-Based Generation and Automatic Selection of Explanations for Neural Networks2019Conference paper (Refereed)
    Abstract [en]

    One way to interpret trained deep neural networks (DNNs) is by inspecting characteristics that neurons in the model respond to, such as by iteratively optimising themodelinput(e.g.,animage)tomaximallyactivatespecificneurons. However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual, qualitative evaluation of each setting, which is prohibitively slow. We introduce a new metricthatusesFr´echetInceptionDistance(FID)toencouragesimilaritybetween model activations for real and generated data. This provides an efficient way to evaluateasetofgeneratedexamplesforeachsettingofhyper-parameters. Wealso propose a novel GAN-based method for generating explanations that enables an efficient search through the input space and imposes a strong prior favouring realistic outputs. We apply our approach to a classification model trained to predict whether a music audio recording contains singing voice. Our results suggest that thisproposedmetricsuccessfullyselectshyper-parametersleadingtointerpretable examples, avoiding the need for manual evaluation. Moreover, we see that examples synthesised to maximise or minimise the predicted probability of singing voice presence exhibit vocal or non-vocal characteristics, respectively, suggesting that our approach is able to generate suitable explanations for understanding concepts learned by a neural network.

    Download full text (pdf)
    fulltext
  • 27.
    Müller, Meinard
    et al.
    International Audio Laboratories Erlangen, Germany.
    Dixon, Simon
    Queen Mary University of London, UK.
    Volk, Anja
    Utrecht University, Netherlands.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Rao, Preeti
    Indian Institute of Technology Bombay, India.
    Gotham, Mark
    Durham University, UK.
    Introducing the TISMIR Education Track: What, Why, How?2024In: Transactions of the International Society for Music Information Retrieval, E-ISSN 2514-3298, Vol. 7, no 1, p. 85-98Article in journal (Refereed)
    Abstract [en]

    This editorial introduces the new education track for the Transactions of the International Society for Music Information Retrieval (TISMIR) and aims to provide guidance to both prospective authors and users of this track’s material regarding its context, goals, and scope. To begin, we offer TISMIR-specific context, including the journal’s history, its unchanged scope and remit, and the motivations behind introducing the new track. This context is supplemented by broader insights into developments in the field of Music Information Retrieval (MIR), the personal pedagogical experiences of the authors, and the rapid, extensive development of Open Educational Resources across various domains. We highlight the key characteristics of educational articles in general and explore why the music domain may provide an intuitive and motivating setting for education across various levels and disciplines. The education track aligns with existing tracks in terms of TISMIR’s dedication to scientific research in MIR, broadly defined as the processing, analyzing, organizing, and creating of music and music-related information using computational methods. Educational articles within this track maintain the high standards expected in terms of scientific rigor, clarity of language, and compelling presentation. However, they differ in their focus on a tutorial-style delivery and their emphasis on existing MIR research methods, techniques, principles, and practical matters relevant to the diverse interests of the MIR community. Through this editorial, our objective is to offer guidance, clarify review criteria, and stimulate discussion on crafting effective educational articles, thereby laying the foundation for a broader discourse on education within MIR and beyond.

  • 28.
    Purwins, Hendrik
    et al.
    Aalborg Univ, Fac IT & Design, Dept Architecture Design & Media Technol, DK-2450 Copenhagen, Denmark..
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Li, Bo
    Google Inc, Mountain View, CA 94043 USA..
    Nam, Juhan
    Korea Adv Inst Sci & Technol, Grad Sch Culture Technol, Daejeon 34141, South Korea..
    Alwan, Abeer
    Univ Calif Los Angeles, Speech Proc & Auditory Percept Lab, Elect & Comp Engn, Los Angeles, CA 90095 USA..
    Introduction to the Issue on Data Science: Machine Learning for Audio Signal Processing2019In: IEEE Journal on Selected Topics in Signal Processing, ISSN 1932-4553, E-ISSN 1941-0484, Vol. 13, no 2, p. 203-205Article in journal (Refereed)
  • 29.
    Rodríguez-Algarra, Francisco
    et al.
    Queen Mary University of London.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Dixon, Simon
    Queen Mary University of London.
    Characterising Confounding Effects in Music Classification Experiments through Interventions2019In: Transactions of the International Society for Music Information Retrieval, ISSN 2514-3298, Vol. 2, no 1, p. 52-66Article in journal (Refereed)
    Abstract [en]

    We address the problem of confounding in the design of music classification experiments, that is, the inability to distinguish the effects of multiple potential influencing variables in the measurements. Confounding affects the validity of conclusions at many levels, and so must be properly accounted for. We propose a procedure for characterising effects of confounding in the results of music classification experiments by creating regulated test conditions through interventions in the experimental pipeline, including a novel resampling strategy. We demonstrate this procedure on the GTZAN genre collection, which is known to give rise to confounding effects.

    Download full text (pdf)
    fulltext
  • 30.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    An artificial critic of Irish double jigs2021In: Proceedings of the 2nd Joint Conference on AI Music Creativity, AIMC, 2021, p. 10-Conference paper (Refereed)
    Abstract [en]

    This paper describes a component of the music generation system that produced an award-winning tune at The Ai Music Generation Challenge 2020. This challenge involved four Irish traditional music experts judging 35 tunes generated by seven systems in reference to a recognised collection of a specific kind of dance music. The winning system uses an “artificial critic” that accepts or rejects a generated tune based on a variety of criteria related to metric structure and intervalic content. Such an artificial critic can help one explore massive generated music collections, as well as synthesise new training music collections. 

    Download full text (pdf)
    fulltext
  • 31.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Generative AI helps one express things for which they may not have expressions (yet)2022In: Proc. Generative AI and HCI Workshop at CHI, 2022Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 32.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    How Stuff Works: LSTM Model of Folk Music Transcriptions2018Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 33.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Proceedings of The 2020 Joint Conference on AI Music Creativity2020Conference proceedings (editor) (Refereed)
    Download full text (pdf)
    fulltext
  • 34.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    The Ai music generation challenge 2021: Summary and results2022In: Proceedings of the 3rd Conference on AI Music Creativity, AIMC, 2022Conference paper (Refereed)
    Abstract [en]

    We discuss the design and results of The Ai Music Generation Challenge 2021 and compare it to the challenge of the previous year. While the 2020 challenge was focused on the Irish double jig, the 2021 challenge was focused on a particular kind of Swedish traditional dance music, called slängpolska. Six systems participated in the 2021 challenge, each generating a number of tunes evaluated by five judges, all professional musicians and experts in the music style. In the first phase, the judges reject all tunes that are plagiarised, or that have incorrect meter or rhythm. In the second phase, they score the remaining tunes along four qualities: dancability, structure coherence, formal coherence, and playability. The judges know all the tunes are computer generated, but do not know what tunes come from what systems, or what kinds of machine learning and data are involved. In the third stage, the judges award prizes to the top tunes. This resulted in five tunes garnering first and second prizes, four of which come from one particular system. We perform a statistical analysis of the scores from all judges, which allows a quantitative comparison of all factors in the challenge. Finally, we look to the 2022 challenge. 

    Download full text (pdf)
    fulltext
  • 35.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    The Ai Music Generation Challenge 2022: Summary and Results2023In: Proc. AI Music Creativity Conference, 2023Conference paper (Refereed)
    Abstract [en]

    We discuss the design and results of The Ai Music Generation Challenge 2022 and compare it to the previous two challenges. While the 2020 challenge focused on generating Irish double jigs, and the 2021 challenge focused on generating Swedish slängpolskor, the 2022 challenge posed three sub-challenges in the context of Irish traditional music: generation of reels, judging tune submissions, and titling tunes. In total seven systems participated in the sub-challenges, along with benchmark systems. One tune was awarded first prize by the judges, and two tunes shared second prize. A submitted system for judging tunes clearly performed better than two benchmarks. Finally, human tune-titling outperformed the benchmark and submitted system, but gave rise to some interesting issues about tune titling.

    Download full text (pdf)
    fulltext
  • 36.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    What do these 5,599,881 parameters mean?: An analysis of a specific LSTM music transcription model, starting with the 70,281 parameters of its softmax layer2018In: Proceedings of the 6th International Workshop on Musical Metacreation (MUME 2018), 2018Conference paper (Refereed)
    Abstract [en]

    A folk-rnn model is a long short-term memory network (LSTM) that generates music transcriptions. We have evaluated these models in a variety of ways – from statistical analyses of generated transcriptions, to their use in music practice – but have yet to understand how their behaviours precipitate from their parameters. This knowledge is essential for improving such models, calibrating them, and broadening their applicability. In this paper, we analyse the parameters of the softmax output layer of a specific model realisation. We discover some key aspects of the model’s local and global behaviours, for instance, that its ability to construct a melody is highly reliant on a few symbols. We also derive a way to adjust the output of the last hidden layer of the model to attenuate its probability of producing specific outputs.

    Download full text (pdf)
    fulltext
  • 37.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Ben-Tal, Oded
    Kingston University.
    Folk the Algorithms: (Mis)Applying Artificial Intelligence to Folk Music2021In: Handbook of Artificial Intelligence for Music, Switzerland: Springer Berlin/Heidelberg, 2021Chapter in book (Refereed)
    Download full text (pdf)
    fulltext
  • 38.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Ben-Tal, Oded
    Kingston University, UK.
    Let’s Have Another Gan Ainm: An experimental album of Irish traditional music and computer-generated tunes2018Report (Other academic)
    Abstract [en]

    This technical report details the creation and public release of an album of folk music, most which comes from material generated by computer models trained on transcriptions of traditional music of Ireland and the UK.For each computer-generated tune appearing on the album, we provide below the original version and the alterations made.

    Download full text (pdf)
    fulltext
  • 39.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Ben-Tal, Oded
    Kingston University, UK.
    Monaghan, Úna
    Cambridge University, UK.
    Collins, Nick
    Durham University, UK.
    Herremans, Dorien
    University of Technology and Design, Singapore.
    Chew, Elaine
    Queen Mary University of London, UK.
    Hadjeres, Gäetan
    Sony CSL, Paris.
    Deruty, Emmanuel
    Sony CSL, Paris.
    Pachet, François
    Spotify, Paris.
    Machine Learning Research that Matters for Music Creation: A Case Study2019In: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, Vol. 48, no 1, p. 36-55Article in journal (Refereed)
    Abstract [en]

    Research applying machine learning to music modeling and generation typically proposes model architectures, training methods and datasets, and gauges system performance using quantitative measures like sequence likelihoods and/or qualitative listening tests. Rarely does such work explicitly question and analyse its usefulness for and impact on real-world practitioners, and then build on those outcomes to inform the development and application of machine learning. This article attempts to do these things for machine learning applied to music creation. Together with practitioners, we develop and use several applications of machine learning for music creation, and present a public concert of the results. We reflect on the entire experience to arrive at several ways of advancing these and similar applications of machine learning to music creation.

    Download full text (pdf)
    fulltext
  • 40.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Flexer, Arthur
    JKU Linz.
    A Review of Validity and its Relationship to Music Information Research2023In: Proc. Int. Symp. Music Information Retrieval, 2023Conference paper (Refereed)
    Abstract [en]

    Validity is the truth of an inference made from evidence and is a central concern in scientific work. Given the maturity of the domain of music information research (MIR), validity in our opinion should be discussed and considered much more than it has been so far. Puzzling MIR phenomena like adversarial attacks, horses, and performance glass ceilings become less mysterious through the lens of validity. In this paper, we review the subject of validity as presented in a key reference of causal inference: Shadish et al., "Experimental and Quasi-experimental Designs for Generalised Causal Inference". We discuss the four types of validity and threats to each one. We consider them in relationship to MIR experiments grounded with a practical demonstration using a typical MIR experiment. 

    Download full text (pdf)
    fulltext
  • 41.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Iglesias, Maria
    Joint Research Centre, European Commission.
    Ben-Tal, Oded
    Kingston University.
    Miron, Marius
    Joint Research Centre, European Commission.
    Gómez, Emilia
    Joint Research Centre, European Commission.
    Artificial Intelligence and Music: Open Questions of Copyright Law and Engineering Praxis2019In: MDPI Arts, ISSN 2076-0752, Vol. 8, no 3, article id 115Article in journal (Refereed)
    Abstract [en]

    The application of artificial intelligence (AI) to music stretches back many decades, and presents numerous unique opportunities for a variety of uses, such as the recommendation of recorded music from massive commercial archives, or the (semi-)automated creation of music. Due to unparalleled access to music data and effective learning algorithms running on high-powered computational hardware, AI is now producing surprising outcomes in a domain fully entrenched in human creativity—not to mention a revenue source around the globe. These developments call for a close inspection of what is occurring, and consideration of how it is changing and can change our relationship with music for better and for worse. This article looks at AI applied to music from two perspectives: copyright law and engineering praxis. It grounds its discussion in the development and use of a specific application of AI in music creation, which raises further and unanticipated questions. Most of the questions collected in this article are open as their answers are not yet clear at this time, but they are nonetheless important to consider as AI technologies develop and are applied more widely to music, not to mention other domains centred on human creativity.

    Download full text (pdf)
    fulltext
  • 42.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. N/a.
    Maruri-Aguilar, Hugo
    Queen Mary University of London.
    The Ai Music Generation Challenge 2020: Double Jigs in the Style of O'Neill's ``1001''2021In: Journal of Creative Music Systems, E-ISSN 2399-7656, Vol. 5, no 1Article in journal (Refereed)
    Abstract [en]

    This article describes and analyses the Ai Music Generation Challenge 2020,where seven participants competed to build artificial systems that generate the most plausible double jigs, as judged against the 365 published in The Dance Music of Ireland: O’Neill’s 1001 (1907).The outcomes of this challenge demonstrate howmusic generation systems can be meaningfully evaluated,and furthermore that the generation of plausible double jigshas yet to be ``solved''. The article ends by reflecting on the challenge and considers the coming 2021 challenge,focused on a form of Swedish traditional dance music.

    Download full text (pdf)
    fulltext
  • 43.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Uitdenbogerd, A. L.
    RMIT University, AU.
    Koops, H. V.
    RMIT University, AU.
    Huang, A.
    University of Montreal, CA.
    Editorial for TISMIR Special Collection: AI and Musical Creativity2022In: Transactions of the International Society for Music Information Retrieval, ISSN 2514-3298, Vol. 5, no 1, p. 67-70Article in journal (Other academic)
    Abstract [en]

    This special issue focuses on research developments and critical thought in the domain of artificial intelligence (AI) applied to modeling and creating music. It is motivated by the AI Song Contests of 2020 and 2021, in which the four guest editors adjudicated or participated among many teams from around the world. The 2020 edition had 13 submissions and the 2021 edition had 38. The 2022 edition is now being planned. These unique events provide exciting opportunities for AI music researchers to test the state of the art and push the boundaries of what is possible, within the context of music creation. They portend a future when humans and machines work together as partners in music creation. Maybe "portend" is not the right term, but we must not think that the future of AI and music is only warm and fuzzy. It is important and timely to consider how we, in local and global contexts, can effectively and ethically develop and apply AI in contexts of music creation.

  • 44.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    58,105 Irish-style double jigs 2021Artistic output (Unrefereed)
    Abstract [en]

    These 58,105 tunes come from a collection of 100,001 generated by the folk-rnn (v2) model trained on tens of thousands of transcriptions of Irish traditional dance music. This subset are tunes approved by an artificial critic according to how well they exemplify characteristics of the 365 double jigs in O’Neill’s “1001” (1907) – in reference to a few rough measures. Each tune is numbered in the order of its generation by folk-rnn (v2). Two values accompanying each tune denote its “distance” from the double jigs in O’Neill’s “1001”, and the greatest number of quavers of 24 matching the double jigs in O’Neill’s “1001”. More information about this critic and these tunes can be found in: B.L.T. Sturm, “An Artificial Critic of Irish Double Jigs”, in Proc. AI Music Creativity Conference, Graz, 2021. 

    Download full text (pdf)
    fulltext
  • 45.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    The folk-rnn (v1) Session Book: Volumes 1--202017Artistic output (Unrefereed)
    Download full text (pdf)
    Vol 1
    Download full text (pdf)
    Vol 2
    Download full text (pdf)
    Vol 3
    Download full text (pdf)
    Vol 4
    Download full text (pdf)
    Vol 5
    Download full text (pdf)
    Vol 6
    Download full text (pdf)
    Vol 7
    Download full text (pdf)
    Vol 8
    Download full text (pdf)
    Vol 9
    Download full text (pdf)
    Vol 10
    Download full text (pdf)
    Vol 11
    Download full text (pdf)
    Vol 12
    Download full text (pdf)
    Vol 13
    Download full text (pdf)
    Vol 14
    Download full text (pdf)
    Vol 15
    Download full text (pdf)
    Vol 16
    Download full text (pdf)
    Vol 17
    Download full text (pdf)
    Vol 18
    Download full text (pdf)
    Vol 19
    Download full text (pdf)
    Vol 20
1 - 45 of 45
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf