kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 12) Show all publications
Fernandez-Martín, C., Colomer, A., Panariello, C. & Naranjo, V. (2024). Choosing only the best voice imitators: Top-K many-to-many voice conversion with StarGAN. Speech Communication, 156, Article ID 103022.
Open this publication in new window or tab >>Choosing only the best voice imitators: Top-K many-to-many voice conversion with StarGAN
2024 (English)In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 156, article id 103022Article in journal (Refereed) Published
Abstract [en]

Voice conversion systems have become increasingly important as the use of voice technology grows. Deep learning techniques, specifically generative adversarial networks (GANs), have enabled significant progress in the creation of synthetic media, including the field of speech synthesis. One of the most recent examples, StarGAN-VC, uses a single pair of generator and discriminator to convert voices between multiple speakers. However, the training stability of GANs can be an issue. The Top-K methodology, which trains the generator using only the best K generated samples that “fool” the discriminator, has been applied to image tasks and simple GAN architectures. In this work, we demonstrate that the Top-K methodology can improve the quality and stability of converted voices in a state-of-the-art voice conversion system like StarGAN-VC. We also explore the optimal time to implement the Top-K methodology and how to reduce the value of K during training. Through both quantitative and qualitative studies, it was found that the Top-K methodology leads to quicker convergence and better conversion quality compared to regular or vanilla training. In addition, human listeners perceived the samples generated using Top-K as more natural and were more likely to believe that they were produced by a human speaker. The results of this study demonstrate that the Top-K methodology can effectively improve the performance of deep learning-based voice conversion systems.

Place, publisher, year, edition, pages
Elsevier BV, 2024
Keywords
Generative adversarial networks, Non-parallel, Speech synthesis, Top-k, Voice conversion
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-341442 (URN)10.1016/j.specom.2023.103022 (DOI)001133269900001 ()2-s2.0-85178611630 (Scopus ID)
Note

QC 20240110

Available from: 2024-01-10 Created: 2024-01-10 Last updated: 2024-01-16Bibliographically approved
Panariello, C. (2023). Converging Creativity: Intertwining Music and Code. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Converging Creativity: Intertwining Music and Code
2023 (English)Doctoral thesis, comprehensive summary (Other academic) [Artistic work]
Abstract [en]

This compilation thesis is a collection of case studies that presents examples of creative coding in various contexts, focusing on how such practice led to the creation and exploration of musical expressions, and how I in- interact with the design of the code itself. My own experience as a music composer influences this thesis work. By saying so, I mean that although the thesis places itself in the Sound and Music Computing academic tradition, it is also profoundly founded upon a personal artistic perspective. This perspective has been the overarching view that has informed the studies included in the thesis, despite all being quite different. The first part of the thesis describes the practice of creative coding, creativity models, and the interaction between code and coder. Then I propose a perspective on creative coding based on the idea of asymptotic convergence of creativity. This is followed by a presentation of five papers and three music works, all inspected through my stance on this creative practice. Finally, I examine and discuss these works in detail, concluding by suggesting that the asymptotic convergence of creativity framework might serve as a useful tool that adds to the literature on creative coding practice, especially for situations in which such work is carried out in an academic research setting. 

Abstract [sv]

I denna sammanläggningsavhandling presenteras ett antal fallstudier med fokus på kreativ programmering (engelska: creative coding) i en rad olika sammanhang. Fokus ligger på hur kreativ programmering stimulerat musikskapande och utforskning av olika musikaliska uttryck, samt hur jag själv interagerat med kod i sådana kontexter. Detta avhandlingsarbete är till stor del influerat av min personliga erfarenheter och bakgrund som kompositör. Även om avhandlingen befinner sig i en akademisk kontext, närmare bestämt inom den vetenskapliga traditionen för ljud- och musikbehandling, så är avhandlingsarbetet också djupt rotat i ett konstnärligt perspektiv. Detta perspektiv har influerat och präglat de studier som beskrivs i denna avhandling, trots att studierna sinsemellan är av ganska skiljd karaktär. Den första delen av denna avhandling beskriver kreativ programmering och dess praktik, olika kreativitetsmodeller samt samspelet mellan utvecklare och kod. Sedan föreslår jag ett perspektiv på kreativ kodning som bygger på idén om kreativitetens asymptotiska konvergens. Detta efterföljs av en genomgång av fem artiklar och tre musikverk, vilka analyseras med hjälp av min ansats till denna praktik. Slutligen granskar och diskuterar jag dessa verk i detalj och avslutar med att föreslå att ramverket för kreativitetens asymptotiska konvergens kan fungera som ett användbart verktyg som bidrar till litteraturen om kreativ kodningspraxis, särskilt för situationer där sådant arbete utförs i en akademisk forskningsmiljö. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. p. 71
Series
TRITA-EECS-AVL ; 2023:40
Keywords
Sound and Music Computing, Music Composition
National Category
Computer and Information Sciences Human Computer Interaction Music Arts
Research subject
Media Technology
Identifiers
urn:nbn:se:kth:diva-327072 (URN)978-91-8040-585-0 (ISBN)
Public defence
2023-06-09, https://kth-se.zoom.us/j/62182993824, Kollegiesalen, Brinellvägen 6, Stockholm, 09:00 (English)
Opponent
Supervisors
Note

QC 20230522

Available from: 2023-05-22 Created: 2023-05-17 Last updated: 2025-02-21Bibliographically approved
Latupeirissa, A. B., Panariello, C. & Bresin, R. (2023). Probing Aesthetics Strategies for Robot Sound: Complexity and Materiality in Movement Sonification. ACM Transactions on Human-Robot Interaction
Open this publication in new window or tab >>Probing Aesthetics Strategies for Robot Sound: Complexity and Materiality in Movement Sonification
2023 (English)In: ACM Transactions on Human-Robot Interaction, E-ISSN 2573-9522Article in journal (Refereed) Published
Abstract [en]

This paper presents three studies where we probe aesthetics strategies of sound produced by movement sonification of a Pepper robot by mapping its movements to sound models.

We developed two sets of sound models. The first set was made by two sound models, a sawtooth-based one and another based on feedback chains, for investigating how the perception of synthesized robot sounds would depend on their design complexity. We implemented the second set of sound models for probing the “materiality” of sound made by a robot in motion. This set consisted of a sound synthesis based on an engine highlighting the robot’s internal mechanisms, a metallic sound synthesis highlighting the robot’s typical appearance, and a whoosh sound synthesis highlighting the movement.

We conducted three studies. The first study explores how the first set of sound models can influence the perception of expressive gestures of a Pepper robot through an online survey. In the second study, we carried out an experiment in a museum installation with a Pepper robot presented in two scenarios: (1) while welcoming patrons into a restaurant and (2) while providing information to visitors in a shopping center. Finally, in the third study, we conducted an online survey with stimuli similar to those used in the second study.

Our findings suggest that participants preferred more complex sound models for the sonification of robot movements. Concerning the materiality, participants liked better subtle sounds that blend well with the ambient sound (i.e., less distracting) and soundscapes in which sound sources can be identified. Also, sound preferences varied depending on the context in which participants experienced the robot-generated sounds (e.g., as a live museum installation vs. an online display).

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
SONAO
National Category
Human Computer Interaction Robotics and automation
Identifiers
urn:nbn:se:kth:diva-324962 (URN)10.1145/3585277 (DOI)001153514400008 ()2-s2.0-85170233153 (Scopus ID)
Note

QC 20230328

Available from: 2023-03-21 Created: 2023-03-21 Last updated: 2025-02-05Bibliographically approved
Panariello, C. & Frid, E. (2023). SuperOM: a SuperCollider class to generate music scores in OpenMusic. In: Anthony Paul De Ritis, Victor Zappi, Jeremy Van Buskirk and John Mallia (Ed.), Proceedings of the 8th International Conference on Technologies for Music Notation and Representation (TENOR): . Paper presented at TENOR - International Conference on Technologies for Music Notation and Representation, Boston, MA, USA, May 15-17, 2023 (pp. 68-75). Boston, MA, USA: Northeastern University Library
Open this publication in new window or tab >>SuperOM: a SuperCollider class to generate music scores in OpenMusic
2023 (English)In: Proceedings of the 8th International Conference on Technologies for Music Notation and Representation (TENOR) / [ed] Anthony Paul De Ritis, Victor Zappi, Jeremy Van Buskirk and John Mallia, Boston, MA, USA: Northeastern University Library , 2023, p. 68-75Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces SuperOM, a class built for the software SuperCollider in order to create a bridge to OpenMu- sic and thus facilitate the creation of musical scores from SuperCollider patches. SuperOM is primarily intended to be used as a tool for SuperCollider users who make use of assisted composition techniques and want the output of such processes to be captured through automatic notation transcription. This paper first presents an overview of existing transcription tools for SuperCollider, followed by a detailed description of SuperOM and its implementation, as well as examples of how it can be used in practice. Finally, a case study in which the transcription tool was used as an assistive composition tool to generate the score of a sonification – which later was turned into a piano piece – is discussed. 

Place, publisher, year, edition, pages
Boston, MA, USA: Northeastern University Library, 2023
Keywords
SuperCollider, OpenMusic, class, computer assisted composition, automatic music notation
National Category
Music
Identifiers
urn:nbn:se:kth:diva-327068 (URN)10.17760/D20511476 (DOI)
Conference
TENOR - International Conference on Technologies for Music Notation and Representation, Boston, MA, USA, May 15-17, 2023
Note

QC 20230630

Available from: 2023-05-17 Created: 2023-05-17 Last updated: 2025-02-21Bibliographically approved
Panariello, C. & Percivati, C. (2023). “WYPYM”: A Study for Feedback-Augmented Bass Clarinet. In: : . Paper presented at NIME 2023 - New Interfaces for Musical Expression, 31 May — 3 June, 2023, Mexico City, Mexico.
Open this publication in new window or tab >>“WYPYM”: A Study for Feedback-Augmented Bass Clarinet
2023 (English)Conference paper, Poster (with or without abstract) (Refereed)
Keywords
Augmented instruments, Feedback systems, Co-creativity, Music Human-Computer Interaction, Performance
National Category
Music Human Computer Interaction Arts
Identifiers
urn:nbn:se:kth:diva-327070 (URN)
Conference
NIME 2023 - New Interfaces for Musical Expression, 31 May — 3 June, 2023, Mexico City, Mexico
Note

QC 20230707

Available from: 2023-05-17 Created: 2023-05-17 Last updated: 2025-02-21Bibliographically approved
Frid, E., Panariello, C. & Núñez-Pacheco, C. (2022). Customizing and Evaluating Accessible Multisensory Music Experiences with Pre-Verbal Children: A Case Study on the Perception of Musical Haptics Using Participatory Design with Proxies. Multimodal Technologies and Interaction, 6(7), Article ID 55.
Open this publication in new window or tab >>Customizing and Evaluating Accessible Multisensory Music Experiences with Pre-Verbal Children: A Case Study on the Perception of Musical Haptics Using Participatory Design with Proxies
2022 (English)In: Multimodal Technologies and Interaction, ISSN 2414-4088, Vol. 6, no 7, article id 55Article in journal (Refereed) Published
Abstract [en]

Research on Accessible Digital Musical Instruments (ADMIs) has highlighted the need for participatory design methods, i.e., to actively include users as co-designers and informants in the design process. However, very little work has explored how pre-verbal children with Profound and Multiple Disabilities (PMLD) can be involved in such processes. In this paper, we apply in-depth qualitative and mixed methodologies in a case study with four students with PMLD. Using Participatory Design with Proxies (PDwP), we assess how these students can be involved in the customization and evaluation of the design of a multisensory music experience intended for a large-scale ADMI. Results from an experiment focused on communication of musical haptics highlighted the diversity in employed interaction strategies used by the children, accessibility limitations of the current multisensory experience design, and the importance of using a multifaceted variety of qualitative and quantitative methods to arrive at more informed conclusions when applying a design with proxies methodology.

Place, publisher, year, edition, pages
MDPI AG, 2022
Keywords
accessible digital musical instruments, multimodal feedback, haptics, multisensory rooms, participatory design, disability studies
National Category
Music Other Engineering and Technologies Computer and Information Sciences
Research subject
Media Technology; Human-computer Interaction; Art, Technology and Design
Identifiers
urn:nbn:se:kth:diva-316293 (URN)10.3390/mti6070055 (DOI)000832049500001 ()2-s2.0-85136133459 (Scopus ID)
Projects
Ljudskogen
Funder
Swedish Research Council, 2020-00343
Note

QC 20220812

Available from: 2022-08-12 Created: 2022-08-12 Last updated: 2025-02-21Bibliographically approved
Frid, E. & Panariello, C. (2022). Haptic Music Players for Children with Profound and Multiple Learning Dis-abilities (PMLD): Exploring Different Modes of Interaction for Felt Sound. In: Jeremy Marozeau, Sebastian Merchel (Ed.), Proceedings of the 24th International Congress on Acoustics (ICA2022): A10 -05 Physiological Acoustics - Multi-modal solutions to enhance hearing. Paper presented at International Congress on Acoustics, October 24 to 28, 2022 in Gyeongju, Korea. Gyeongju, South Korea: Acoustic Society of Korea, Article ID ABS-0021.
Open this publication in new window or tab >>Haptic Music Players for Children with Profound and Multiple Learning Dis-abilities (PMLD): Exploring Different Modes of Interaction for Felt Sound
2022 (English)In: Proceedings of the 24th International Congress on Acoustics (ICA2022): A10 -05 Physiological Acoustics - Multi-modal solutions to enhance hearing / [ed] Jeremy Marozeau, Sebastian Merchel, Gyeongju, South Korea: Acoustic Society of Korea , 2022, article id ABS-0021Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a six-month exploratory case study on the evaluation of three Haptic Music Players (HMPs) with four pre-verbal children with Profound and Multiple Learning Disabilities (PMLD). The evaluated HMPs were 1) a commercially available haptic pillow, 2) a haptic device embedded in a modified plush-toy backpack, and 3) a custom-built plush toy with a built-in speaker and tactile shaker. We evaluated the HMPs through qualitative interviews with a teacher who served as a proxy for the preverbal children participating in the study; the teacher augmented the students’ communication by reporting observations from each test session. The interviews explored functionality, accessibility, versus user experience aspects of respective HMP and revealed significant differences between devices. Our findings highlighted the influence of physical affordances provided by the HMP designs and the importance of a playful design in this context. Results suggested that sufficient time should be allocated to HMP familiarization prior to any evaluation procedure, since experiencing musical haptics through objects is a novel experience that might require some time to get used to. We discuss design considerations for Haptic Music Players and provide suggestions for future developments of multimodal systems dedicated to enhancing music listening in special education settings. 

Place, publisher, year, edition, pages
Gyeongju, South Korea: Acoustic Society of Korea, 2022
Keywords
tactile sound, haptics, accessibility
National Category
Music Other Engineering and Technologies Computer and Information Sciences
Research subject
Media Technology; Human-computer Interaction; Art, Technology and Design
Identifiers
urn:nbn:se:kth:diva-331144 (URN)2-s2.0-85162296434 (Scopus ID)
Conference
International Congress on Acoustics, October 24 to 28, 2022 in Gyeongju, Korea
Funder
Swedish Research Council, 2020-00343
Note

QC 20230706

Available from: 2023-07-05 Created: 2023-07-05 Last updated: 2025-02-21Bibliographically approved
Panariello, C. & Bresin, R. (2022). Sonification of Computer Processes: The Cases of Computer Shutdown and Idle Mode. Frontiers in Neuroscience, 16, Article ID 862663.
Open this publication in new window or tab >>Sonification of Computer Processes: The Cases of Computer Shutdown and Idle Mode
2022 (English)In: Frontiers in Neuroscience, ISSN 1662-4548, E-ISSN 1662-453X, Vol. 16, article id 862663Article in journal (Refereed) Published
Abstract [en]

Software is intangible, invisible, and at the same time pervasive in everyday devices, activities, and services accompanying our life. Therefore, citizens hardly realize its complexity, power, and impact in many aspects of their daily life. In this study, we report on one experiment that aims at letting citizens make sense of software presence and activity in their everyday lives, through sound: the invisible complexity of the processes involved in the shutdown of a personal computer. We used sonification to map information embedded in software events into the sound domain. The software events involved in a shutdown have names related to the physical world and its actions: write events (information is saved into digital memories), kill events (running processes are terminated), and exit events (running programs are exited). The research study presented in this article has a "double character. " It is an artistic realization that develops specific aesthetic choices, and it has also pedagogical purposes informing the causal listener about the complexity of software behavior. Two different sound design strategies have been applied: one strategy is influenced by the sonic characteristics of the Glitch music scene, which makes deliberate use of glitch-based sound materials, distortions, aliasing, quantization noise, and all the "failures " of digital technologies; and a second strategy based on the sound samples of a subcontrabass Paetzold recorder, an unusual and special acoustic instrument which unique sound has been investigated in the contemporary art music scene. Analysis of quantitative ratings and qualitative comments of 37 participants revealed that the sound design strategies succeeded in communicating the nature of the computer processes. Participants also showed in general an appreciation of the aesthetics of the peculiar sound models used in this study.

Place, publisher, year, edition, pages
Frontiers Media SA, 2022
Keywords
sonification, software processes, aesthetic, glitch, Paetzold recorder
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-313328 (URN)10.3389/fnins.2022.862663 (DOI)000797869400001 ()35600615 (PubMedID)2-s2.0-85134158938 (Scopus ID)
Projects
FutureSoundSONAO
Funder
Swedish Research Council, 2017-03979NordForsk, 86892
Note

QC 20230404

Available from: 2022-06-02 Created: 2022-06-02 Last updated: 2025-02-18Bibliographically approved
Bresin, R., Frid, E., Latupeirissa, A. B. & Panariello, C. (2021). Robust Non-Verbal Expression in Humanoid Robots: New Methods for Augmenting Expressive Movements with Sound. In: : . Paper presented at Workshop on Sound in Human-Robot Interaction at HRI 2021.
Open this publication in new window or tab >>Robust Non-Verbal Expression in Humanoid Robots: New Methods for Augmenting Expressive Movements with Sound
2021 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

The aim of the SONAO project is to establish new methods basedon sonification of expressive movements for achieving a robust interaction between users and humanoid robots. We want to achievethis by combining competences of the research team members inthe fields of social robotics, sound and music computing, affective computing, and body motion analysis. We want to engineersound models for implementing effective mappings between stylized body movements and sound parameters that will enable anagent to express high-level body motion qualities through sound.These mappings are paramount for supporting feedback to andunderstanding robot body motion. The project will result in thedevelopment of new theories, guidelines, models, and tools forthe sonic representation of high-level body motion qualities in interactive applications. This work is part of the growing researchfield known as data sonification, in which we combine methodsand knowledge from the fields of interactive sonification, embodied cognition, multisensory perception, non-verbal and gesturalcommunication in robots.

National Category
Human Computer Interaction Computer and Information Sciences
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-293349 (URN)
Conference
Workshop on Sound in Human-Robot Interaction at HRI 2021
Projects
SONAO
Note

QC 20211116

Available from: 2021-04-22 Created: 2021-04-22 Last updated: 2025-02-18Bibliographically approved
Latupeirissa, A. B., Panariello, C. & Bresin, R. (2020). Exploring emotion perception in sonic HRI. In: 17th Sound and Music Computing Conference: . Paper presented at Sound and Music Computing Conference, Torino, 24-26 June 2020 (pp. 434-441). Torino: Zenodo
Open this publication in new window or tab >>Exploring emotion perception in sonic HRI
2020 (English)In: 17th Sound and Music Computing Conference, Torino: Zenodo , 2020, p. 434-441Conference paper, Published paper (Refereed)
Abstract [en]

Despite the fact that sounds produced by robots can affect the interaction with humans, sound design is often an overlooked aspect in Human-Robot Interaction (HRI). This paper explores how different sets of sounds designed for expressive robot gestures of a humanoid Pepper robot can influence the perception of emotional intentions. In the pilot study presented in this paper, it has been asked to rate different stimuli in terms of perceived affective states. The stimuli were audio, audio-video and video only and contained either Pepper’s original servomotors noises, sawtooth, or more complex designed sounds. The preliminary results show a preference for the use of more complex sounds, thus confirming the necessity of further exploration in sonic HRI.

Place, publisher, year, edition, pages
Torino: Zenodo, 2020
National Category
Computer and Information Sciences Human Computer Interaction Computer graphics and computer vision Other Computer and Information Science
Research subject
Media Technology; Art, Technology and Design; Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-277947 (URN)10.5281/ZENODO.3898928 (DOI)2-s2.0-85101259342 (Scopus ID)
Conference
Sound and Music Computing Conference, Torino, 24-26 June 2020
Projects
SONAO
Funder
Swedish Research Council, 2017-03979
Note

QC 20200722

Available from: 2020-07-02 Created: 2020-07-02 Last updated: 2025-02-18Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-1244-881x

Search in DiVA

Show all publications