Change search
Refine search result
3456789 251 - 300 of 1064
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 251. Crosnier, S
    et al.
    Blomberg, Mats
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Elenius, K
    Speech recogniser sensitivity to the variation of different control parameters in synthetic speech1989Conference paper (Refereed)
    Abstract [en]

    Knowledge of a speech recognizer's sensitivity to different speech production parameters can be used to improve the system or to predict its behaviour in a given application. In this report, a speech recognition system has been tested using manipulated synthetic speech. A text-to-speech system was used for producing words with the 9 Swedish long vowels in CVC context. A "normal" production of each word served as reference template for the recognition system. The test set consisted of the same words where the value of one control parameter at a time was changed from its original position. The mel cepstrum distance between the reference and the manipulated word was measured. Modifying the pitch, voice source spectral slope and the first four formant frequencies had large influence on the distance, while varying formant bandwiths resulted in small effects. The relation between individual formants is different to results from experiments using natural listeners. The results indicate that the sensitivity to pitch and voice source spectrum variation will degrade the recognizer's performance in speaker-independent applications and during stress and that some form of normalisation is needed.

  • 252. Csapo, A.
    et al.
    Gilmartin, E.
    Grizou, J.
    Han, J.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Anastasiou, D.
    Jokinen, K.
    Wilcock, G.
    Multimodal conversational interaction with a humanoid robot2012In: 3rd IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2012 - Proceedings, IEEE , 2012, p. 667-672Conference paper (Refereed)
    Abstract [en]

    The paper presents a multimodal conversational interaction system for the Nao humanoid robot. The system was developed at the 8th International Summer Workshop on Multimodal Interfaces, Metz, 2012. We implemented WikiTalk, an existing spoken dialogue system for open-domain conversations, on Nao. This greatly extended the robot's interaction capabilities by enabling Nao to talk about an unlimited range of topics. In addition to speech interaction, we developed a wide range of multimodal interactive behaviours by the robot, including face-tracking, nodding, communicative gesturing, proximity detection and tactile interrupts. We made video recordings of user interactions and used questionnaires to evaluate the system. We further extended the robot's capabilities by linking Nao with Kinect.

  • 253. Csapo, A.
    et al.
    Gilmartin, E.
    Grizou, J.
    Han, J.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Anastasiou, D.
    Jokinen, K.
    Wilcock, G.
    Open-Domain Conversation with a NAO Robot2012In: 3rd International Conference on Cognitive Infocommunications (CogInfoCom 2012), Kosice, 2012Conference paper (Refereed)
    Abstract [en]

    In this demo, we present a multimodal conversationsystem, implemented using a Nao robot and Wikipedia. The system was developed at the 8th International Workshop on Multimodal Interfaces in Metz, France, 2012. The system is based on an interactive, open-domain spoken dialogue systemcalled WikiTalk, which guides the user through conversations based on the link structure of Wikipedia. In addition to speech interaction, the robot interacts with users by tracking their faces and nodding/gesturing at key points of interest within the Wikipedia text. The proximity detection capabilities of the Nao,as well as its tactile sensors were used to implement context-based interrupts in the dialogue system.

  • 254. Cuayahuitl, Heriberto
    et al.
    Komatani, Kazunori
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Introduction for Speech and language for interactive robots2015In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 34, no 1, p. 83-86Article in journal (Refereed)
    Abstract [en]

    This special issue includes research articles which apply spoken language processing to robots that interact with human users through speech, possibly combined with other modalities. Robots that can listen to human speech, understand it, interact according to the conveyed meaning, and respond represent major research and technological challenges. Their common aim is to equip robots with natural interaction abilities. However, robotics and spoken language processing are areas that are typically studied within their respective communities with limited communication across disciplinary boundaries. The articles in this special issue represent examples that address the need for an increased multidisciplinary exchange of ideas.

  • 255.
    Dabbaghchian, Saeed
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Arnela, Marc
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    SIMPLIFICATION OF VOCAL TRACT SHAPES WITH DIFFERENT LEVELS OF DETAIL2015In: Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, UK, University of Glasgow , 2015, p. 1-5Conference paper (Refereed)
    Abstract [en]

    We propose a semi-automatic method to regenerate simplified vocal tract geometries from very detailed input (e.g. MRI-based geometry) with the possibility to control the level of detail, while maintaining the overall properties. The simplification procedure controls the number and organization of the vertices in the vocal tract surface mesh and can be assigned to replace complex cross-sections with regular shapes. Six different geometry regenerations are suggested: bent or straight vocal tract centreline, combined with three different types of cross-sections; namely realistic, elliptical or circular. The key feature in the simplification is that the cross-sectional areas and the length of the vocal tract are maintained. This method may, for example, be used to facilitate 3D finite element method simulations of vowels and diphthongs and to examine the basic acoustic characteristics of vocal tract in printed physical replicas. Furthermore, it allows for multimodal solutions of the wave equation.

  • 256.
    Dabbaghchian, Saeed
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Arnela, Marc
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Guasch, Oriol
    Synthesis of VV utterances from muscle activation to sound with a 3d model2017In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, The International Speech Communication Association (ISCA), 2017, p. 3497-3501Conference paper (Refereed)
    Abstract [en]

    We propose a method to automatically generate deformable 3D vocal tract geometries from the surrounding structures in a biomechanical model. This allows us to couple 3D biomechanics and acoustics simulations. The basis of the simulations is muscle activation trajectories in the biomechanical model, which move the articulators to the desired articulatory positions. The muscle activation trajectories for a vowel-vowel utterance are here defined through interpolation between the determined activations of the start and end vowel. The resulting articulatory trajectories of flesh points on the tongue surface and jaw are similar to corresponding trajectories measured using Electromagnetic Articulography, hence corroborating the validity of interpolating muscle activation. At each time step in the articulatory transition, a 3D vocal tract tube is created through a cavity extraction method based on first slicing the geometry of the articulators with a semi-polar grid to extract the vocal tract contour in each plane and then reconstructing the vocal tract through a smoothed 3D mesh-generation using the extracted contours. A finite element method applied to these changing 3D geometries simulates the acoustic wave propagation. We present the resulting acoustic pressure changes on the vocal tract boundary and the formant transitions for the utterance [Ai].

  • 257.
    Dabbaghchian, Saeed
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Arnela, Marc
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Guasch, Oriol
    Stavness, Ian
    Badin, Pierre
    Using a Biomechanical Model and Articulatory Data for the Numerical Production of Vowels2016In: Interspeech 2016, 2016, p. 3569-3573Conference paper (Refereed)
    Abstract [en]

    We introduce a framework to study speech production using a biomechanical model of the human vocal tract, ArtiSynth. Electromagnetic articulography data was used as input to an inverse tracking simulation that estimates muscle activations to generate 3D jaw and tongue postures corresponding to the target articulator positions. For acoustic simulations, the vocal tract geometry is needed, but since the vocal tract is a cavity rather than a physical object, its geometry does not explicitly exist in a biomechanical model. A fully-automatic method to extract the 3D geometry (surface mesh) of the vocal tract by blending geometries of the relevant articulators has therefore been developed. This automatic extraction procedure is essential, since a method with manual intervention is not feasible for large numbers of simulations or for generation of dynamic sounds, such as diphthongs. We then simulated the vocal tract acoustics by using the Finite Element Method (FEM). This requires a high quality vocal tract mesh without irregular geometry or self-intersections. We demonstrate that the framework is applicable to acoustic FEM simulations of a wide range of vocal tract deformations. In particular we present results for cardinal vowel production, with muscle activations, vocal tract geometry, and acoustic simulations.

  • 258.
    Dahl, Sofia
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Movements and analysis of drumming2012In: Music, Motor Control and the Brain, Oxford University Press, 2012Chapter in book (Refereed)
    Abstract [en]

    This chapter analyses the movement strategies used in drumming. These movement strategies can be described as whiplash-like and aim at achieving high stick velocities on impact. Skilled playing of percussion instruments involves adjusting to and utilising the kinesthetic feedback from the instrument in question. The overall patterns of the movement strategies are maintained consistently for different tempi, surfaces and dynamic levels. The height to which the stick is lifted in preparation for a stroke and the vertical velocity of the stick marker at impact are both strongly linked to the dynamic level.

  • 259.
    Dahl, Sofia
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    On the beat: human movement and timing in the production and perception of music2005Doctoral thesis, comprehensive summary (Other scientific)
    Abstract [en]

    This thesis addresses three aspects of movement, performance and perception in music performance. First, the playing of an accent, a simple but much used and practiced element in drumming is studied, second, the perception of gradually changing tempo, and third, the perception and communication of specific emotional intentions through movements during music performance.

    Papers I and II investigated the execution and interpretation of an accent in drumming, performed under different playing conditions. Players' movements, striking velocities and timing patterns were studied for different tempi, dynamic levels and striking surfaces. It was found that the players used differing movement strategies and that interpreted the accent differently, reflected in their movement trajectories. Strokes at higher dynamic levels were played from a greater average height and with higher striking velocities. All players initiated the accented strokes from a greater height, and delivered the accent with increased striking velocity compared to the unaccented strokes. The interval beginning with the accented stroke was also prolonged, generally by delaying the following stroke. Recurrent cyclic patterns were found in the players' timing performances. In a listening test, listeners perceived grouping of the strokes according to the cyclic patterns.

    Paper III concerned the perception of gradual tempo changes in auditory sequences. Using an adaptive test procedure subjects judged stimuli consisting of click sequences with either increasing or decreasing tempo, respectively. Each experiment included three test sessions at different nominal tempi (80, 120, and 180~beats per minute). The results showed that ten of the eleven subjects showed an inherent bias in their perception of tempo drift. The direction and magnitude of the bias was consistent between test sessions but varied between individuals. The just noticeable differences for tempo drift agreed well with the estimated tempo drifts in production data, but were much smaller than earlier reported thresholds for tempo drift.

    Paper IV studied how emotional intent in music performances is conveyed to observers through the movements of the musicians. Three players of marimba, bassoon, and saxophone respectively, were filmed when playing with the expressive intentions Happiness, Sadness, Anger and Fear. Observers rated the emotional content and movement cues in the videos clips shown without sound. The results showed that the observers were able to identify the intentions Sadness, Anger, and Happiness, but not Fear. The rated movement cues showed that an Angry performance was characterized by jerky movements, Happy performances by large, and somewhat fast and jerky movements, and Sad performances by slow, and smooth movements.

  • 260.
    Dahl, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bevilacqua, Frédéric
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Clayton, Martin
    Leante, Laura
    Poggi, Isabella
    Rasamimanana, Nicolas
    Gestures in performance2009In: Musical Gestures: Sound, Movement, and Meaning / [ed] Godøy, Rolf Inge; Leman, Marc, New York: Routledge , 2009, p. 36-68Chapter in book (Refereed)
    Abstract [en]

    We experience and understand the world, including music, through body movement–when we hear something, we are able to make sense of it by relating it to our body movements, or form an image in our minds of body movements. Musical Gestures is a collection of essays that explore the relationship between sound and movement. It takes an interdisciplinary approach to the fundamental issues of this subject, drawing on ideas, theories and methods from disciplines such as musicology, music perception, human movement science, cognitive psychology, and computer science.

  • 261.
    Dahl, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Friberg, Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Expressiveness of musician's body movements in performances on marimba2004In: Gesture-Based Communication in Human-Computer Interaction / [ed] Camurri, A.; Volpe, G., Genoa: Springer Verlag , 2004, p. 479-486Conference paper (Refereed)
    Abstract [en]

    To explore to what extent emotional intentions can be conveyed through musicians’ movements, video recordings were made of amarimba player performing the same piece with the intentions Happy, Sad, Angry and Fearful. 20 subjects were presented video clips, without sound, and asked to rate both the perceived emotional content as well as the movement qualities. The video clips were presented in different conditions, showing the player to different extent. The observers’ ratings forthe intended emotions confirmed that the intentions Happiness, Sadness and Anger were well communicated, while Fear was not. Identification of the intended emotion was only slightly influenced by the viewing condition. The movement ratings indicated that there were cues that the observers used to distinguish between intentions, similar to cues found for audio signals in music performance.

  • 262.
    Dahl, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Friberg, Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Visual perception of expressiveness in musicians' body movements2007In: Music perception, ISSN 0730-7829, E-ISSN 1533-8312, Vol. 24, no 5, p. 433-454Article in journal (Refereed)
    Abstract [en]

    MUSICIANS OFTEN MAKE GESTURES and move their bodies expressing a musical intention. In order to explore to what extent emotional intentions can be conveyed through musicians' movements, participants watched and rated silent video clips of musicians performing the emotional intentions Happy, Sad, Angry, and Fearful. In the first experiment participants rated emotional expression and movement character of marimba performances. The results showed that the intentions Happiness, Sadness, and Anger were well communicated, whereas Fear was not. Showing selected parts of the player only slightly influenced the identification of the intended emotion. In the second experiment participants rated the same emotional intentions and movement character for performances on bassoon and soprano saxophone. The ratings from the second experiment confirmed that Fear was not communicated whereas Happiness, Sadness, and Anger were recognized. The rated movement cues were similar in the two experiments and were analogous to their audio counterpart in music performance.

  • 263.
    Dahl, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Granqvist, Svante
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Ability to determine continuous drift in auditory sequences: Evidence for bias in listeners' perception of tempo2005In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524Article in journal (Other academic)
  • 264.
    De Witt, Anna
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Sound design for affective interaction2007In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics / [ed] Paiva, A; Prada, R; Picard, RW, 2007, Vol. 4738, p. 523-533Conference paper (Refereed)
    Abstract [en]

    Different design approaches contributed to what we see today as the prevalent design paradigm for Human Computer Interaction; though they have been mostly applied to the visual aspect of interaction. In this paper we presented a proposal for sound design strategies that can be used in applications involving affective interaction. For testing our approach we propose the sonification of the Affective Diary, a digital diary with focus on emotions, affects, and bodily experience of the user. We applied results from studies in music and emotion to sonic interaction design. This is one of the first attempts introducing different physics-based models for the real-time complete sonification of an interactive user interface in portable devices.

  • 265.
    Degirmenci, Niyazi Cem
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Arnela, Marc
    Sánchez-Martín, Patricia
    Guasch, Oriol
    Ternström, Sten
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    A Unified Numerical Simulation of Vowel Production That Comprises Phonation and the Emitted Sound2017In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, The International Speech Communication Association (ISCA), 2017, p. 3492-3496Conference paper (Refereed)
    Abstract [en]

    A unified approach for the numerical simulation of vowels is presented, which accounts for the self-oscillations of the vocal folds including contact, the generation of acoustic waves and their propagation through the vocal tract, and the sound emission outwards the mouth. A monolithic incompressible fluid-structure interaction model is used to simulate the interaction between the glottal jet and the vocal folds, whereas the contact model is addressed by means of a level set application of the Eikonal equation. The coupling with acoustics is done through an acoustic analogy stemming from a simplification of the acoustic perturbation equations. This coupling is one-way in the sense that there is no feedback from the acoustics to the flow and mechanical fields. All the involved equations are solved together at each time step and in a single computational run, using the finite element method (FEM). As an application, the production of vowel [i] has been addressed. Despite the complexity of all physical phenomena to be simulated simultaneously, which requires resorting to massively parallel computing, the formant locations of vowel [i] have been well recovered.

  • 266.
    Demoucron, Matthias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    On the control of virtual violins: Physical modelling and control of bowed string instruments 2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

      This thesis treats the control of sound synthesis of bowed string instruments based on physical modelling. The work followed two approaches: (a) a systematic exploration of the influence of control parameters (bow force, bow velocity, and bow-bridge distance) on the output of a physical model of the violin, and (b) measurements and analyses of the bowing parameters in real violin playing in order to model and parameterize basic classes of bowing patterns for synthesis control.First a bowed-string model based on modal solutions of the string equation is described and implemented for synthesis of violin sounds. The behaviour of the model is examined through simulations focusing on playability, i.e. the control parameter space in which a periodic Helmholtz motion is obtained, and the variations of the properties of the simulated sound (sound level and spectral centroid) within this parameter space. The response of the model corresponded well with theoretical predictions and empirical expectations based on observations of real performances. The exploration of the model allowed to define optimal parameter regions for the synthesis, and to map sound properties on the control parameters.A second part covers the development of a sensor for measuring the bow force in real violin performance. The force sensor was later combined with an optical motion capture system for measurement of complete sets of bowing parameters in violin performance.In a last part, measurements of the control parameters for basic classes of bowing patterns (sautillé, spiccato, martelé, tremolo) are analyzed in order to propose a realistic control of the sound synthesis. The time evolution of the bowing parameters were modelled by analytical functions, which allowed to describe and control simulated bowing patterns by a limited set of control parameters. For sustained bowing patterns such as détaché, control strategies for basic elements in playing (variations in dynamic level, bow changes) were extracted from exemplary measurements, and simple rules deduced, which allowed extrapolation of parameters to modified bow strokes with other durations and at different dynamic levels.

  • 267.
    Demoucron, Matthias
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Askenfelt, Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Caussé, Rene
    IRCAM CNRS STMS.
    Measuring Bow Force in Bowed String Performance: Theory and Implementation of a Bow Force Sensor2009In: Acta Acoustica united with Acustica, ISSN 1610-1928, E-ISSN 1861-9959, Vol. 95, no 4, p. 718-732Article in journal (Refereed)
    Abstract [en]

    A sensor has been developed which allows measurement of the force exerted by the bow on the string ( bow force) during violin performance. The bow force is deduced from measurement of the transversal force at the termination of the bow hair at the frog. The principle is illustrated with an experiment that demonstrates how the bending of the stick and variations in bow hair tension influence the measurements. The design of the sensor is described and performance characteristics are discussed. A thorough calibration procedure is described and tested. Finally, the use of the sensor is demonstrated through measurements in real playing situations.

  • 268. Dong, Li
    et al.
    Kong, Jiangping
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Long-term-average spectrum characteristics of Kunqu Opera singers' speaking, singing and stage speech2014In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 39, no 2, p. 72-80Article in journal (Refereed)
    Abstract [en]

    Long-term-average spectrum (LTAS) characteristics were analyzed for ten Kunqu Opera singers, two in each of five roles. Each singer performed singing, stage speech, and conversational speech. Differences between the roles and between their performances of these three conditions are examined. After compensating for Leq difference LTAS characteristics still differ between the roles but are similar for the three conditions, especially for Colorful face (CF) and Old man roles, and especially between reading and singing. The curves show no evidence of a singer's formant cluster peak, but the CF role demonstrates a speaker's formant peak near 3 kHz. The LTAS characteristics deviate markedly from non-singers' standard conversational speech as well as from those of Western opera singing.

  • 269. Dong, Li
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Kong, Jiangping
    Loudness and Pitch of Kunqu Opera2014In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 28, no 1, p. 14-19Article in journal (Refereed)
    Abstract [en]

    Equivalent sound level (Leq), sound pressure level (SPL), and fundamental frequency (F-0) are analyzed in each of five Kunqu Opera roles, Young girl and Young woman, Young man, Old man, and Colorful face. Their pitch ranges are similar to those of some western opera singers (alto, alto, tenor, baritone, and baritone, respectively). Differences among tasks, conditions (stage speech, singing, and reading lyrics), singers, and roles are examined. For all singers, Leq of stage speech and singing were considerably higher than that of conversational speech. Interrole differences of Leq among tasks and singers were larger than the intrarole differences. For most roles, time domain variation of SPL differed between roles both in singing and stage speech. In singing, as compared with stage speech, SPL distribution was more concentrated and variation of SPL with time was smaller. With regard to gender and age, male roles had higher mean Leq and lower average F-0, MF0, as compared with female roles. Female singers showed a wider F-0 distribution for singing than for stage speech, whereas the opposite was true for male singers. The Leq of stage speech was higher than in singing for young personages. Younger female personages showed higher Leq, whereas older male personages had higher Leq. The roles performed with higher Leq tended to be sung at a lower MF0.

  • 270.
    Dravins, Christina
    et al.
    The National Agency for Special Needs Education and Schools.
    van Besouw, Rachel
    ISVR, University of Southampton.
    Hansen, Kjetil Falkenberg
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Kuske, Sandra
    Latvian Children's Hearing Centre.
    Exploring and enjoying non-speech sounds through a cochlear implant: the therapy of music2010In: 11th International Conference on Cochlear Implants and other Implantable Technologies, Karolinska University Hospital, 2010, p. 356-Conference paper (Refereed)
    Abstract [en]

    Cochlear implant technology was initially designed to promote reception ofspeech sounds; however, music enjoyment remains a challenge. Music is aninfluential ingredient in our well-being, playing an important role in ourcognitive, physical and social development. For many cochlear implantrecipients it is not feasible to communicate how sounds are perceived, andconsequently the benefits of music listening may be reduced. Non-speechsounds may also be important to persons with multiple functional deficitsthat relay on information additional to verbatim for participating incommunication. Deaf-born children with multiple functional deficitsconstitute a special vulnerable group as lack of reaction to sound oftenis discouraging to caregivers. Individually adapted tools and methods forsound awareness may promote exploration and appreciation of theinformation mediated by the implant.Two current works involving habilitation through sound production andmusic will be discussed. First, the results from a pilot study aiming atfinding musical toys that can be adapted to help children explore theirhearing with engaging sounds and expressive interfaces will be presented.The findings indicate that children with multiple functional deficits canbe more inclined to use the auditory channel for communication and playthan the caregivers would anticipate.Second, the results of a recent questionnaire study, which compared themusic exposure and appreciation of preschool cochlear implant recipientswith their normally hearing peers will be presented. The data from thisstudy indicate that preschool children with cochlear implants spendroughly the same amount of time interacting with musical instruments athome and watching television programmes and DVDs which include music.However, the data indicate that these children receive less exposure torecorded music without visual stimuli and show less sophisticatedresponses to music. The provision and supported use of habilitationmaterials which encourage interaction with music might therefore bebeneficial.

  • 271.
    Drioli, Carlo
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    A flow waveform-matched low-dimensional glottal model based on physical knowledge2005In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 117, no 5, p. 3184-3195Article in journal (Refereed)
    Abstract [en]

    The purpose of this study is to explore the possibility for physically based mathematical models of the voice source to accurately reproduce inverse filtered glottal volume-velocity waveforms. A low-dimensional, self-oscillating model of the glottal source with waveform-matching properties is proposed. The model relies on a lumped mechano-aerodynamic scheme loosely inspired by the one-and multimass lumped models. The vocal folds are represented by a single mechanical resonator and a propagation line which takes into account the vertical phase differences. The vocal-fold displacement is coupled to the glottal flow by means of an aerodynamic driving block which includes a general parametric nonlinear component. The principal characteristics of the flow-induced oscillations are retained, and the overall model is able to match inverse-filtered glottal flow signals. The method offers in principle the possibility of performing transformations of the glottal flow by acting on the physiologically based parameters of the model. This is a desirable property, e.g., for speech synthesis applications. The model was tested on a data set which included inverse-filtered glottal flow waveforms of different characteristics. The results demonstrate the possibility of reproducing natural speech waveforms with high accuracy, and of controlling important characteristics of the synthesis such as pitch.

  • 272.
    Dubus, Gaël
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Evaluation of four models for the sonification of elite rowing2012In: Journal on Multimodal User Interfaces, ISSN 1783-7677, E-ISSN 1783-8738, Vol. 5, no 3-4, p. 143-156Article in journal (Refereed)
    Abstract [en]

    Many aspects of sonification represent potential benefits for the practice of sports. Taking advantage of the characteristics of auditory perception, interactive sonification offers promising opportunities for enhancing the training of athletes. The efficient learning and memorizing abilities pertaining to the sense of hearing, together with the strong coupling between auditory and sensorimotor systems, make the use of sound a natural field of investigation in quest of efficiency optimization in individual sports at a high level. This study presents an application of sonification to elite rowing, introducing and evaluating four sonification models.The rapid development of mobile technology capable of efficiently handling numerical information offers new possibilities for interactive auditory display. Thus, these models have been developed under the specific constraints of a mobile platform, from data acquisition to the generation of a meaningful sound feedback. In order to evaluate the models, two listening experiments have then been carried out with elite rowers. Results show a good ability of the participants to efficiently extract basic characteristics of the sonified data, even in a non-interactive context. Qualitative assessment of the models highlights the need for a balance between function and aesthetics in interactive sonification design. Consequently, particular attention on usability is required for future displays to become widespread.

  • 273.
    Dubus, Gaël
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Interactive sonification of motion: Design, implementation and control of expressive auditory feedback with mobile devices2013Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Sound and motion are intrinsically related, by their physical nature and through the link between auditory perception and motor control. If sound provides information about the characteristics of a movement, a movement can also be influenced or triggered by a sound pattern. This thesis investigates how this link can be reinforced by means of interactive sonification. Sonification, the use of sound to communicate, perceptualize and interpret data, can be used in many different contexts. It is particularly well suited for time-related tasks such as monitoring and synchronization, and is therefore an ideal candidate to support the design of applications related to physical training. Our objectives are to develop and investigate computational models for the sonification of motion data with a particular focus on expressive movement and gesture, and for the sonification of elite athletes movements.  We chose to develop our applications on a mobile platform in order to make use of advanced interaction modes using an easily accessible technology. In addition, networking capabilities of modern smartphones potentially allow for adding a social dimension to our sonification applications by extending them to several collaborating users. The sport of rowing was chosen to illustrate the assistance that an interactive sonification system can provide to elite athletes. Bringing into play complex interactions between various kinematic and kinetic quantities, studies on rowing kinematics provide guidelines to optimize rowing efficiency, e.g. by minimizing velocity fluctuations around average velocity. However, rowers can only rely on sparse cues to get information relative to boat velocity, such as the sound made by the water splashing on the hull. We believe that an interactive augmented feedback communicating the dynamic evolution of some kinematic quantities could represent a promising way of enhancing the training of elite rowers. Since only limited space is available on a rowing boat, the use of mobile phones appears appropriate for handling streams of incoming data from various sensors and generating an auditory feedback simultaneously. The development of sonification models for rowing and their design evaluation in offline conditions are presented in Paper I. In Paper II, three different models for sonifying the synchronization of the movements of two users holding a mobile phone are explored. Sonification of expressive gestures by means of expressive music performance is tackled in Paper III. In Paper IV, we introduce a database of mobile applications related to sound and music computing. An overview of the field of sonification is presented in Paper V, along with a systematic review of mapping strategies for sonifying physical quantities. Physical and auditory dimensions were both classified into generic conceptual dimensions, and proportion of use was analyzed in order to identify the most popular mappings. Finally, Paper VI summarizes experiments conducted with the Swedish national rowing team in order to assess sonification models in an interactive context.

  • 274.
    Dubus, Gaël
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    A Systematic Review of Mapping Strategies for the Sonification of Physical Quantities2013In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 8, no 12, p. e82491-Article in journal (Refereed)
    Abstract [en]

    The field of sonification has progressed greatly over the past twenty years and currently constitutes an established area of research. This article aims at exploiting and organizing the knowledge accumulated in previous experimental studies to build a foundation for future sonification works. A systematic review of these studies may reveal trends in sonification design, and therefore support the development of design guidelines. To this end, we have reviewed and analyzed 179 scientific publications related to sonification of physical quantities. Using a bottom-up approach, we set up a list of conceptual dimensions belonging to both physical and auditory domains. Mappings used in the reviewed works were identified, forming a database of 495 entries. Frequency of use was analyzed among these conceptual dimensions as well as higher-level categories. Results confirm two hypotheses formulated in a preliminary study: pitch is by far the most used auditory dimension in sonification applications, and spatial auditory dimensions are almost exclusively used to sonify kinematic quantities. To detect successful as well as unsuccessful sonification strategies, assessment of mapping efficiency conducted in the reviewed works was considered. Results show that a proper evaluation of sonification mappings is performed only in a marginal proportion of publications. Additional aspects of the publication database were investigated: historical distribution of sonification works is presented, projects are classified according to their primary function, and the sonic material used in the auditory display is discussed. Finally, a mapping-based approach for characterizing sonification is proposed.

  • 275.
    Dubus, Gaël
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Evaluation of a system for the sonification of elite rowing in an interactive contextManuscript (preprint) (Other academic)
  • 276.
    Dubus, Gaël
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics. KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Exploration and evaluation of a system for interactive sonification of elite rowing2015In: Sports Engineering, ISSN 1369-7072, E-ISSN 1460-2687, Vol. 18, no 1, p. 29-41Article in journal (Refereed)
    Abstract [en]

    In recent years, many solutions based on interactive sonification have been introduced for enhancing sport training. Few of them have been assessed in terms of efficiency or design. In a previous study, we performed a quantitative evaluation of four models for the sonification of elite rowing in a non-interactive context. For the present article, we conducted on-water experiments to investigate the effects of some of these models on two kinematic quantities: stroke rate value and fluctuations in boat velocity. To this end, elite rowers interacted with discrete and continuous auditory displays in two experiments. A method for computing an average rowing cycle is introduced, together with a measure of velocity fluctuations. Participants answered to questionnaires and interviews to assess the degree of acceptance of the different models and to reveal common trends and individual preferences. No significant effect of sonification could be determined in either of the two experiments. The measure of velocity fluctuations was found to depend linearly on stroke rate. Participants provided feedback about their aesthetic preferences and functional needs during interviews, allowing us to improve the models for future experiments to be conducted over longer periods.

  • 277.
    Dubus, Gaël
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Sonification of physical quantities throughout history: a meta-study of previous mapping strategies2011In: Proceedings of the 17th International Conference on Auditory Display (ICAD 2011), Budapest, Hungary: OPAKFI Egyesület , 2011Conference paper (Refereed)
    Abstract [en]

    We introduce a meta-study of previous sonification designs taking physical quantities as input data. The aim is to build a solid foundation for future sonification works so that auditory display researchers would be able to take benefit from former studies, avoiding to start from scratch when beginning new sonification projects. This work is at an early stage and the objective of this paper is rather to introduce the methodology than to come to definitive conclusions. After a historical introduction, we explain how to collect a large amount of articles and extract useful information about mapping strategies. Then, we present the physical quantities grouped according to conceptual dimensions, as well as the sound parameters used in sonification designs and we summarize the current state of the study by listing the couplings extracted from the article database. A total of 54 articles have been examined for the present article. Finally, a preliminary analysis of the results is performed.

  • 278.
    Dubus, Gaël
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Sonification of sculler movements, development of preliminary methods2010In: Proceedings of ISon 2010, 3rd Interactive Sonification Workshop / [ed] Bresin, Roberto; Hermann, Thomas; Hunt, Andy, Stockholm, Sweden: KTH Royal Institute of Technology , 2010, p. 39-43Conference paper (Refereed)
    Abstract [en]

    Sonification is a widening field of research with many possibilitiesfor practical applications in various scientific domains. The rapiddevelopment of mobile technology capable of efficiently handlingnumerical information offers new opportunities for interactive auditorydisplay. In this scope, the SONEA project (SONification ofElite Athletes) aims at improving performances of Olympic-levelathletes by enhancing their training techniques, taking advantageof both the strong coupling between auditory and sensorimotorsystems, and the efficient learning and memorizing abilities pertainingthe sense of hearing. An application to rowing is presentedin this article. Rough estimates of the position and mean velocityof the craft are given by a GPS receiver embedded in a smartphonetaken onboard. An external accelerometer provides boatacceleration data with higher temporal resolution. The developmentof preliminary methods for sonifying the collected data hasbeen carried out under the specific constraints of a mobile deviceplatform. The sonification is either performed by the phone as areal-time feedback or by a computer using data files as input foran a posteriori analysis of the training. In addition, environmentalsounds recorded during training can be synchronized with thesonification to perceive the coherence of the sequence of soundsthroughout the rowing cycle. First results show that sonificationusing a parameter-mapping method over

  • 279.
    Dubus, Gaël
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Hansen, Kjetil Falkenberg
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    An overview of sound and music applications for Android available on the market2012In: Proceedings of the 9th Sound and Music Computing Conference, SMC 2012 / [ed] Serafin, Stefania, Sound and music Computing network , 2012, p. 541-546Conference paper (Refereed)
    Abstract [en]

    This paper introduces a database of sound-based applications running on the Android mobile platform. The longterm objective is to provide a state-of-the-art of mobile applications dealing with sound and music interaction. After exposing the method used to build up and maintain the database using a non-hierarchical structure based on tags, we present a classification according to various categories of applications, and we conduct a preliminary analysis of the repartition of these categories reflecting the current state of the database.

  • 280. Durup, E
    et al.
    Jansson, Erik Valter
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    The quest of the violin Bridge-Hill2005In: Acta Acoustica united with Acustica, ISSN 1610-1928, E-ISSN 1861-9959, Vol. 91, no 2, p. 206-213Article in journal (Refereed)
    Abstract [en]

    Good violins have a Bridge-Hill, i.e. a hump between 2 and 3 kHz in their frequency responses, both in radiated sound and in bridge mobility. Experiments have proved that the Hill is not confined to the bridge only. Present experiments show that the Hill can be modelled by a plate-bridge and a rectangular spruce plate with f-holes shaped as up-side-down Us. The flaps, the wings cut free by the up-side-down Us and especially their areas determine the peak frequency of the Hill. By tuning the bridge and the wings of a complete violin the Hill is obtained and can be tuned within a wide frequency range.

  • 281. Echternach, Matthias
    et al.
    Birkholz, Peter
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics. University College of Music Education, Stockholm, Sweden.
    Traser, Louisa
    Korvink, Jan Gerrit
    Richter, Bernhard
    Resonatory Properties in Professional Tenors Singing Above the Passaggio2016In: Acta Acoustica united with Acustica, ISSN 1610-1928, E-ISSN 1861-9959, Vol. 102, no 2, p. 298-306Article in journal (Refereed)
    Abstract [en]

    Introduction: The question of formant tuning in male professional voices has been a matter of discussion for many years. Material and Methods: In this study four very successful Western classically trained tenors of different repertoire were analysed. They sang a scale on the vowel conditions /a,e,i,o,u/ from the pitch C4 (250 Hz) to A4 (440 Hz) in their stage voice avoiding a register shift to falsetto. Formant frequencies were calculated from inverse filtering of the audio signal and from two-dimensional MRI data. Results: Both estimations showed only for vowel conditions with low first formant (F1) a tuning F1 adjusted to the first harmonic. For other vowel conditions, however, no clear systematic formant tuning was observed. Conclusion: For most vowel conditions the data are not able to support the hypothesis of a systematic formant tuning for professional classically trained tenors.

  • 282. Echternach, Matthias
    et al.
    Burk, Fabian
    Koeberlein, Marie
    Selamtzis, Andreas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Doellinger, Michael
    Burdumy, Michael
    Richter, Bernhard
    Herbst, Christian Thomas
    Laryngeal evidence for the first and second passaggio in professionally trained sopranos2017In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 12, no 5, article id e0175865Article in journal (Refereed)
    Abstract [en]

    Introduction Due to a lack of empirical data, the current understanding of the laryngeal mechanics in the passaggio regions (i.e., the fundamental frequency ranges where vocal registration events usually occur) of the female singing voice is still limited. Material and methods In this study the first and second passaggio regions of 10 professionally trained female classical soprano singers were analyzed. The sopranos performed pitch glides from A3 (f(o) = 220 Hz) to A4 (f(o) = 440 Hz) and from A4 (f(o) = 440 Hz) to A5 (f(o) = 880 Hz) on the vowel [i:]. Vocal fold vibration was assessed with trans-nasal high speed videoendoscopy at 20,000 fps, complemented by simultaneous electroglottographic (EGG) and acoustic recordings. Register breaks were perceptually rated by 12 voice experts. Voice stability was documented with the EGG-based sample entropy. Glottal opening and closing patterns during the passaggi were analyzed, supplemented with open quotient data extracted from the glottal area waveform. Results In both the first and the second passaggio, variations of vocal fold vibration patterns were found. Four distinct patterns emerged: smooth transitions with either increasing or decreasing durations of glottal closure, abrupt register transitions, and intermediate loss of vocal fold contact. Audible register transitions (in both the first and second passaggi) generally coincided with higher sample entropy values and higher open quotient variance through the respective passaggi. Conclusions Noteworthy vocal fold oscillatory registration events occur in both the first and the second passaggio even in professional sopranos. The respective transitions are hypothesized to be caused by either (a) a change of laryngeal biomechanical properties; or by (b) vocal tract resonance effects, constituting level 2 source-filter interactions.

  • 283. Echternach, Matthias
    et al.
    Dippold, Sebastian
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Arndt, Susan
    Zander, Mark F.
    Richter, Bernhard
    High-Speed Imaging and Electroglottography Measurements of the Open Quotient in Untrained Male Voices' Register Transitions2010In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 24, no 6, p. 644-650Article in journal (Refereed)
    Abstract [en]

    Vocal fold oscillation patterns in vocal register transitions are still unclarified. The vocal fold oscillations and the open quotient were analyzed with high-speed digital imaging (HSDI) and electroglottography (EGG) in 18 male untrained subjects singing a glissando from modal to the falsetto register. Results reveal that the open quotient changed with register in both HSDI. and EGG. The in-class correlations for different HSDI and EGG determinations of the open quotient were high. However, we found only weak interclass correlations between both methods. In ID subjects, irregularities of vocal fold vibration occurred during the register transition. Our results confirm previous observations that falsetto register is associated with a higher open quotient compared with modal register. These data suggest furthermore that irregularities typically observed in audio and electroglottographic signals during register transitions are caused by irregularities in vocal fold vibration.

  • 284. Echternach, Matthias
    et al.
    Doellinger, Michael
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Traser, Louisa
    Richter, Bernhard
    Vocal fold vibrations at high soprano fundamental frequencies2013In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 133, no 2, p. EL82-EL87Article in journal (Refereed)
    Abstract [en]

    Human voice production at very high fundamental frequencies is not yet understood in detail. It was hypothesized that these frequencies are produced by turbulences, vocal tract/vocal fold interactions, or vocal fold oscillations without closure. Hitherto it has been impossible to visually analyze the vocal mechanism due to technical limitations. Latest high-speed technology, which captures 20 000 frames/s, using transnasal endoscopy was applied. Up to 1568Hz human vocal folds do exhibit oscillations with complete closure. Therefore, the recent results suggest that human voice production at very high F0s up to 1568Hz is not caused by turbulence, but rather by airflow modulation from vocal fold oscillations. (C) 2013 Acoustical Society of America

  • 285. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Arndt, Susan
    Breyer, Tobias
    Markl, Michael
    Schumacher, Martin
    Richter, Bernhard
    Vocal tract and register changes analysed by real-time MRI in male professional singers - a pilot study2008In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 33, no 2, p. 67-73Article in journal (Refereed)
    Abstract [en]

    Changes of vocal tract shape accompanying changes of vocal register and pitch in singing have remained an unclear field. Dynamic real-time magnetic resonance imaging (MRI) was applied to two professional classical singers (a tenor and a baritone) in this pilot study. The singers sang ascending scales from B3 to G#4 on the vowel /a/, keeping the modal register throughout or shifting to falsetto register for the highest pitches. The results show that these singers made few and minor modifications of vocal tract shape when they changed from modal to falsetto and some clear modifications when they kept the register. In this case the baritone increased his tongue dorsum height, widened his jaw opening, and decreased his jaw protrusion, while the tenor merely lifted his uvula. The method used seems promising and should be applied to a greater number of singer subjects in the future.

  • 286. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Arndt, Susan
    Markl, Michael
    Schumacher, Martin
    Richter, Bernhard
    Vocal Tract in Female Registers: A Dynamic Real-Time MRI Study2010In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 24, no 2, p. 133-139Article in journal (Refereed)
    Abstract [en]

    The area of vocal registers is still unclarified. In a previous investigation, dynamic real-time magnetic resonance imaging (MRI), which is able to produce up to 10 frames per second, was successfully applied for examinations of vocal tract modifications in register transitions in male singers. In the present study, the same MRI technique was used to study vocal tract shapes during four professional young sopranos' lower and upper register transitions. The subjects were asked to sing a scale on the vowel /a/ across their transitions. The transitions were acoustically identified by four raters. In neither of these transitions, clear vocal tract changes could be ascertained. However, substantial changes, that is, widening of the lips, opening of the jaw, elevation of the tongue dorsum, and continuous widening of the pharynx, were observed when the singers reached fundamental frequencies that were close to the frequency of the first formant of the vowel sung. These findings suggest that in these subjects register transition was not primarily the result of modifications of the vocal tract.

  • 287. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Baumann, Tobias
    Markl, Michael
    Richter, Bernhard
    Vocal tract area functions and formant frequencies in opera tenors' modal and falsetto registers2011In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 129, no 6, p. 3955-3963Article in journal (Refereed)
    Abstract [en]

    According to recent model investigations, vocal tract resonance is relevant to vocal registers. However, no experimental corroboration of this claim has been published so far. In the present investigation, ten professional tenors' vocal tract configurations were analyzed using MRI volumetry. All subjects produced a sustained tone on the pitch F4 (349 Hz) on the vowel /a/(1) in modal and (2) in falsetto register. The area functions were estimated from the MRI data and their associated formant frequencies were calculated. In a second condition the same subjects repeated the same tasks in a sound treated room and their formant frequencies were estimated by means of inverse filtering. In both recordings similar formant frequencies were observed. Vocal tract shapes differed between modal and falsetto register. In modal as compared to falsetto the lip opening and the oral cavity were wider and the first formant frequency was higher. In this sense the presented results are in agreement with the claim that the formant frequencies differ between registers.

  • 288. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Markl, Michael
    Richter, Bernhard
    Professional Opera Tenors' Vocal Tract Configurations in Registers2010In: Folia Phoniatrica et Logopaedica, ISSN 1021-7762, E-ISSN 1421-9972, Vol. 62, no 6, p. 278-287Article in journal (Refereed)
    Abstract [en]

    Objective: Tenor singers may reach their top pitch range either by shifting from modal to falsetto register or by using their so-called 'voix mixte'. Material and Methods: In this study, dynamic real-time MRI of 8 frames per second was used to analyze the vocal tract profile in 10 professional opera tenors, who sang an ascending scale from C4 (262 Hz) to A4 (440 Hz) on the vowel /a/. The scale included their register transition and the singers applied both register techniques in different takes. Results: Modal to falsetto register changes were associated with only minor vocal tract modifications, including elevation and tilting of the larynx and a lifted tongue dorsum. Transitions to voix mixte, by contrast, were associated with major vocal tract modifications. Under these conditions, the subjects widened their pharynges, their lip and jaw openings, and increased their jaw protrusion. These modifications were stronger in more 'heavy' tenors than in more 'light' tenors. The acoustic consequences of these articulatory changes are discussed.

  • 289. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Zander, Mark F.
    Richter, Bernhard
    Perturbation Measurements in Untrained Male Voices' Transitions From Modal to Falsetto Register2011In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 25, no 6, p. 663-669Article in journal (Refereed)
    Abstract [en]

    Purpose. Voice periodicity during transitions from modal to falsetto register still remains an unclarified question. Method. We examined the acoustic and electroglottographic signals of 20 healthy untrained male voices' transitions from modal to falsetto register on the vowels /a, e, i, o, u, and ae/. Results. In addition to discontinuities in fundamental frequency (F0), an independent increase of jitter, relative average perturbation, and shimmer was observed during and apparently caused by the register transition. In falsetto, the jitter was higher than in the modal register. The contact quotient derived from the electroglottographic signal tended to be lower for higher than for lower F0. Conclusion. Register transitions are associated with increase of perturbation.

  • 290. Edlund, J.
    et al.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Exploring prosody in interaction control2005In: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 62, no 04-feb, p. 215-226Article in journal (Refereed)
    Abstract [en]

    This paper investigates prosodic aspects of turn-taking in conversation with a view to improving the efficiency of identifying relevant places at which a machine can legitimately begin to talk to a human interlocutor. It examines the relationship between interaction control, the communicative function of which is to regulate the flow of information between interlocutors, and its phonetic manifestation. Specifically, the listener's perception of such interaction control phenomena is modelled. Algorithms for automatic online extraction of prosodic phenomena liable to be relevant for interaction control, such as silent pauses and intonation patterns, are presented and evaluated in experiments using Swedish map task data. We show that the automatically extracted prosodic features can be used to avoid many of the places where current dialogue systems run the risk of interrupting their users, as well as to identify suitable places to take the turn.

  • 291.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    How deeply rooted are the turns we take?2011In: SemDial 2011: Proceedings of the 15th Workshop on the Semantics and Pragmatics of Dialogue, 2011, p. 196-197Conference paper (Other academic)
    Abstract [en]

    This poster presents preliminary work investigatingturn-taking in text-based chat with aview to learn something about how deeplyrooted turn-taking is in the human cognition.A connexion is shown between preferred turntakingpatterns and length and type of experiencewith such chats, which supports the ideathat the orderly type of turn-taking found inmost spoken conversations is indeed deeplyrooted, but not more so than that it can beovercome with training in a situation wheresuch turn-taking is not beneficial to the communication.

  • 292.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    In search for the conversational homunculus: serving to understand spoken human face-to-face interaction2011Doctoral thesis, monograph (Other academic)
    Abstract [en]

    In the group of people with whom I have worked most closely, we recently attempted to dress our visionary goal in words: “to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is humanlike”. The “conversational homunculus” figuring in the title of this book represents this “artificial conversational partner”. The vision is motivated by an urge to test computationally our understandings of how human-human interaction functions, and the bulk of my work leads towards the conversational homunculus in one way or another. This book compiles and summarises that work: it sets out with a presenting and providing background and motivation for the long term research goal of creating a humanlike spoken dialogue system, and continues along the lines of an initial iteration of an iterative research process towards that goal, beginning with the planning and collection of human-human interaction corpora, continuing with the analysis and modelling of the human-human corpora, and ending in the implementation of, experimentation with and evaluation of humanlike components for in human-machine interaction. The studies presented have a clear focus on interactive phenomena at the expense of propositional content and syntactic constructs, and typically investigate the regulation of dialogue flow and feedback, or the establishment of mutual understanding and grounding.

  • 293.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Co-present or Not?: Embodiment, Situatedness and the Mona Lisa Gaze Effect2013In: Eye gaze in intelligent user interfaces: gaze-based analyses, models and applications / [ed] Nakano, Yukiko; Conati, Cristina; Bader, Thomas, London: Springer London, 2013, p. 185-203Chapter in book (Refereed)
    Abstract [en]

    The interest in embodying and situating computer programmes took off in the autonomous agents community in the 90s. Today, researchers and designers of programmes that interact with people on human terms endow their systems with humanoid physiognomies for a variety of reasons. In most cases, attempts at achieving this embodiment and situatedness has taken one of two directions: virtual characters and actual physical robots. In addition, a technique that is far from new is gaining ground rapidly: projection of animated faces on head-shaped 3D surfaces. In this chapter, we provide a history of this technique; an overview of its pros and cons; and an in-depth description of the cause and mechanics of the main drawback of 2D displays of 3D faces (and objects): the Mona Liza gaze effect. We conclude with a description of an experimental paradigm that measures perceived directionality in general and the Mona Lisa gaze effect in particular.

  • 294.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    The Mona Lisa Gaze Effect as an Objective Metric for Perceived Cospatiality2011In: Proc. of the Intelligent Virtual Agents 10th International Conference (IVA 2011) / [ed] Vilhjálmsson, Hannes Högni; Kopp, Stefan; Marsella, Stacy; Thórisson, Kristinn R., Springer , 2011, p. 439-440Conference paper (Refereed)
    Abstract [en]

    We propose to utilize the Mona Lisa gaze effect for an objective and repeatable measure of the extent to which a viewer perceives an object as cospatial. Preliminary results suggest that the metric behaves as expected.

  • 295.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tånnander, Christina
    Swedish Agency for Accessible Media, MTM, Stockholm, Sweden.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Audience response system based annotation of speech2013In: Proceedings of Fonetik 2013, Linköping: Linköping University , 2013, p. 13-16Conference paper (Other academic)
    Abstract [en]

    Manual annotators are often used to label speech. The task is associated with high costs and with great time consumption. We suggest to reach an increased throughput while maintaining a high measure of experimental control by borrowing from the Audience Response Systems used in the film and television industries, and demonstrate a cost-efficient setup for rapid, plenary annotation of phenomena occurring in recorded speech together with some results from studies we have undertaken to quantify the temporal precision and reliability of such annotations.

  • 296.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tånnander, Christina
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Temporal precision and reliability of audience response system based annotation2013In: Proc. of Multimodal Corpora 2013, 2013Conference paper (Refereed)
  • 297.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Alexanderson, Simon
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustavsson, Lisa
    Heldner, Mattias
    (Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics) .
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Kallionen, Petter
    Marklund, Ellen
    3rd party observer gaze as a continuous measure of dialogue flow2012In: LREC 2012 - Eighth International Conference On Language Resources And Evaluation, Istanbul, Turkey: European Language Resources Association, 2012, p. 1354-1358Conference paper (Refereed)
    Abstract [en]

    We present an attempt at using 3rd party observer gaze to get a measure of how appropriate each segment in a dialogue is for a speaker change. The method is a step away from the current dependency of speaker turns or talkspurts towards a more general view of speaker changes. We show that 3rd party observers do indeed largely look at the same thing (the speaker), and how this can be captured and utilized to provide insights into human communication. In addition, the results also suggest that there might be differences in the distribution of 3rd party observer gaze depending on how information-rich an utterance is.

  • 298.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Capturing massively multimodal dialogues: affordable synchronization and visualization2010In: Proc. of Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality (MMC 2010) / [ed] Kipp, Michael; Martin, Jean-Claude; Paggio, Patrizia; Heylen, Dirk, 2010, p. 160-161Conference paper (Refereed)
    Abstract [en]

    In this demo, we show (a) affordable and relatively easy-to-implement means to facilitate synchronization of audio, video and motion capture data in post processing, and (b) a flexible tool for 3D visualization of recorded motion capture data aligned with audio and video sequences. The synchronisation is made possible by the use of two simple and analogues devices: a turntable and an easy to build electronic clapper board. The demo shows examples of how the signals from the turntable and the clapper board are traced over the three modalities, using the 3D visualisation tool. We also demonstrate how the visualisation tool shows head and torso movements captured by the motion capture system.

  • 299.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    MushyPeek: A Framework for Online Investigation of Audiovisual Dialogue Phenomena2009In: Language and Speech, ISSN 0023-8309, E-ISSN 1756-6053, Vol. 52, p. 351-367Article in journal (Refereed)
    Abstract [en]

    Evaluation of methods and techniques for conversational and multimodal spoken dialogue systems is complex, as is gathering data for the modeling and tuning of such techniques. This article describes MushyPeek, all experiment framework that allows us to manipulate the audiovisual behavior of interlocutors in a setting similar to face-to-face human-human dialogue. The setup connects two subjects to each other over a Voice over Internet Protocol (VoIP) telephone connection and simultaneously provides each of them with an avatar representing the other. We present a first experiment which inaugurates, exemplifies, and validates the framework. The experiment corroborates earlier findings on the use of gaze and head pose gestures in turn-taking.

  • 300.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Pushy versus meek: using avatars to influence turn-taking behaviour2007In: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, p. 2784-2787Conference paper (Refereed)
    Abstract [en]

    The flow of spoken interaction between human interlocutors is a widely studied topic. Amongst other things, studies have shown that we use a number of facial gestures to improve this flow - for example to control the taking of turns. This type of gestures ought to be useful in systems where an animated talking head is used, be they systems for computer mediated human-human dialogue or spoken dialogue systems, where the computer itself uses speech to interact with users. In this article, we show that a small set of simple interaction control gestures and a simple model of interaction can be used to influence users' behaviour in an unobtrusive manner. The results imply that such a model may improve the flow of computer mediated interaction between humans under adverse circumstances, such as network latency, or to create more human-like spoken human-computer interaction.

3456789 251 - 300 of 1064
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf