Ändra sökning
Avgränsa sökresultatet
1234567 101 - 150 av 1064
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 101.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Multimodal Interaction Control2009Ingår i: Computers in the Human Interaction Loop / [ed] Waibel, Alexander; Stiefelhagen, Rainer, Berlin/Heidelberg: Springer Berlin/Heidelberg, 2009, s. 143-158Kapitel i bok, del av antologi (Refereegranskat)
  • 102.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Cerrato, Loredana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Evaluation of the expressivity of a Swedish talking head in the context of human-machine interaction2008Ingår i: Comunicazione parlatae manifestazione delle emozioni: Atti del I Convegno GSCP, Padova 29 novembre - 1 dicembre 2004 / [ed] Emanuela Magno Caldognetto, Federica Cavicchio e Piero Cosi, 2008Konferensbidrag (Refereegranskat)
    Abstract [en]

    ABSTRACTThis paper describes a first attempt at synthesis and evaluation of expressive visualarticulation using an MPEG-4 based virtual talking head. The synthesis is data-driven,trained on a corpus of emotional speech recorded using optical motion capture. Eachemotion is modelled separately using principal component analysis and a parametriccoarticulation model.In order to evaluate the expressivity of the data driven synthesis two tests wereconducted. Our talking head was used in interactions with a human being in a givenrealistic usage context.The interactions were presented to external observers that were asked to judge theemotion of the talking head. The participants in the experiment could only hear the voice ofthe user, which was a pre-recorded female voice, and see and hear the talking head. Theresults of the evaluation, even if constrained by the results of the implementation, clearlyshow that the visual expression plays a relevant role in the recognition of emotions.

  • 103.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hellmer, Kahl
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Project presentation: Spontal: multimodal database of spontaneous dialog2009Ingår i: Proceedings of Fonetik 2009: The XXIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 190-193Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We describe the ongoing Swedish speech database project Spontal: Multimodal database of spontaneous speech in dialog (VR 2006-7482). The project takes as its point of departure the fact that both vocal signals and gesture involving the face and body are important in every-day, face-to-face communicative interaction, and that there is a great need for data with which we more precisely measure these.

  • 104.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Face-to-Face Interaction and the KTH Cooking Show2010Ingår i: DEVELOPMENT OF MULTIMODAL INTERFACES: ACTIVE LISTING AND SYNCHRONY / [ed] Esposito A; Campbell N; Vogel C; Hussain A; Nijholt A, 2010, Vol. 5967, s. 157-168Konferensbidrag (Refereegranskat)
    Abstract [en]

    We share our experiences with integrating motion capture recordings in speech and dialogue research by describing (1) Spontal, a large project collecting 60 hours of video, audio and motion capture spontaneous dialogues, is described with special attention to motion capture and its pitfalls; (2) a tutorial where we use motion capture, speech synthesis and an animated talking head to allow students to create an active listener; and (3) brief preliminary results in the form of visualizations of motion capture data over time in a Spontal dialogue. We hope that given the lack of writings on the use of motion capture for speech research, these accounts will prove inspirational and informative.

  • 105.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Jonsson, Oskar
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Speech technology in the European project MonAMI2008Ingår i: Proceedings of FONETIK 2008 / [ed] Anders Eriksson, Jonas Lindh, Gothenburg, Sweden: University of Gothenburg , 2008, s. 33-36Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper describes the role of speech and speech technology in the European project MonAMI, which aims at “mainstreaming ac-cessibility in consumer goods and services, us-ing advanced technologies to ensure equal ac-cess, independent living and participation for all”. It presents the Reminder, a prototype em-bodied conversational agent (ECA) which helps users to plan activities and to remember what to do. The prototype merges speech technology with other, existing technologies: Google Cal-endar and a digital pen and paper. The solution allows users to continue using a paper calendar in the manner they are used to, whilst the ECA provides notifications on what has been written in the calendar. Users may also ask questions such as “When was I supposed to meet Sara?” or “What’s on my schedule today?”

  • 106.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Innovative interfaces in MonAMI: The Reminder2008Ingår i: Perception In Multimodal Dialogue Systems, Proceedings / [ed] Andre, E; Dybkjaer, L; Minker, W; Neumann, H; Pieraccini, R; Weber, M, 2008, Vol. 5078, s. 272-275Konferensbidrag (Refereegranskat)
    Abstract [en]

    This demo paper presents the first version of the Reminder, a prototype ECA developed in the European project MonAMI, which aims at "main-streaming accessibility in consumer goods and services, using advanced technologies to ensure equal access, independent living and participation for all". The Reminder helps users to plan activities and to remember what to do. The prototype merges ECA technology with other, existing technologies: Google Calendar and a digital pen and paper. This innovative combination of modalities allows users to continue using a paper calendar in the manner they are used to, whilst the ECA provides verbal notifications on what has been written in the calendar. Users may also ask questions such as "When was I supposed to meet Sara?" or "What's on my schedule today?"

  • 107.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Tobiasson, Helena
    KTH, Skolan för datavetenskap och kommunikation (CSC), Människa-datorinteraktion, MDI (stängd 20111231).
    The MonAMI Reminder: a spoken dialogue system for face-to-face interaction2009Ingår i: Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, U.K, 2009, s. 300-303Konferensbidrag (Refereegranskat)
    Abstract [en]

    We describe the MonAMI Reminder, a multimodal spoken dialogue system which can assist elderly and disabled people in organising and initiating their daily activities. Based on deep interviews with potential users, we have designed a calendar and reminder application which uses an innovative mix of an embodied conversational agent, digital pen and paper, and the web to meet the needs of those users as well as the current constraints of speech technology. We also explore the use of head pose tracking for interaction and attention control in human-computer face-to-face interaction.

  • 108.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Modelling humanlike conversational behaviour2010Ingår i: SLTC 2010: The Third Swedish Language Technology Conference (SLTC 2010), Proceedings of the Conference, Linköping, Sweden, 2010, s. 9-10Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We have a visionar y goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is humanlike. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: modelling interactional aspects of spoken face-to-face communication.

  • 109.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Research focus: Interactional aspects of spoken face-to-face communication2010Ingår i: Proceedings from Fonetik, Lund, June 2-4, 2010: / [ed] Susanne Schötz, Gilbert Ambrazaitis, Lund, Sweden: Lund University , 2010, s. 7-10Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We have a visionary goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is human-like. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: interactional aspects of spoken face-to-face communication.

  • 110.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Nordstrand, Magnus
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    A Model for Multimodal Dialogue System Output Applied to an Animated Talking Head2005Ingår i: SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE IN MOBILE ENVIRONMENTS / [ed] Minker, Wolfgang; Bühler, Dirk; Dybkjær, Laila, Dordrecht: Springer , 2005, s. 93-113Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    We present a formalism for specifying verbal and non-verbal output from a multimodal dialogue system. The output specification is XML-based and provides information about communicative functions of the output, without detailing the realisation of these functions. The aim is to let dialogue systems generate the same output for a wide variety of output devices and modalities. The formalism was developed and implemented in the multimodal spoken dialogue system AdApt. We also describe how facial gestures in the 3D-animated talking head used within this system are controlled through the formalism.

  • 111.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Nordqvist, Peter
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Visualization of speech and audio for hearing-impaired persons2008Ingår i: Technology and Disability, ISSN 1055-4181, Vol. 20, nr 2, s. 97-107Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Speech and sounds are important sources of information in our everyday lives for communication with our environment, be it interacting with fellow humans or directing our attention to technical devices with sound signals. For hearing impaired persons this acoustic information must be supplemented or even replaced by cues using other senses. We believe that the most natural modality to use is the visual, since speech is fundamentally audiovisual and these two modalities are complementary. We are hence exploring how different visualization methods for speech and audio signals may support hearing impaired persons. The goal in this line of research is to allow the growing number of hearing impaired persons, children as well as the middle-aged and elderly, equal participation in communication. A number of visualization techniques are proposed and exemplified with applications for hearing impaired persons.

  • 112.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Goda utsikter för teckenspråksteknologi2010Ingår i: Språkteknologi för ökad tillgänglighet: Rapport från ett nordiskt seminarium / [ed] Domeij, R.; Breivik, T.; Halskov, J.; Kirchmeier-Andersen, S.; Langgård, P.; Moshagen, S., Linköping: Linköping University Electronic Press, 2010, s. 77-86Konferensbidrag (Övrigt vetenskapligt)
    Abstract [sv]

    I dag finns stora brister i tillgängligheten i samhället vad gäller teckentolkning. Nya tekniska landvinningar inom dator- och animationsteknologi, och det senaste decenniets forskning kring syntetisk teckentolkning har lett till att det nu finns nya förutsättningar att hitta tekniska lösningar med potential att förbättra tillgängligheten avsevärt för teckenspråkiga, för vissa typer av tjänster eller situationer. I Sverige finns idag ca 30 000 teckenspråksanvändare. Kunskapsläget har utvecklats mycket under senare år, både vad gäller förståelse/beskrivning av teckenspråk och tekniska förutsättningar för att analysera, lagra och generera teckenspråk. I kapitlet beskriver vi de olika tekniker som krävs för att utveckla teckenspråkteknologi. Det senaste decenniet har forskningen kring teckenspråkteknogi tagit fart, och ett flertal internationella projekt har startat. Ännu har bara ett fåtal tillämpningarblivit allmänt tillgängliga. Vi ger exempel på både forskningsprojekt och tidiga tillämpningar, speciellt från Europa där utvecklingen varit mycket stark. Utsikterna att starta en svensk utveckling inom området får anses goda. De kunskapsmässiga förutsättningarna är utmärkta; teknikkunnande inom språkteknologi, multimodal registrering och animering bl.a. vid KTH i kombination med fackkunskaper inom svenskt teckenspråk och teckenspråksanvändning vid Stockholms Universitet.

  • 113.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Analysis and synthesis of multimodal verbal and non-verbal interaction for animated interface agents2007Ingår i: VERBAL AND NONVERBAL COMMUNICATION BEHAVIOURS / [ed] Esposito, A; FaundezZanuy, M; Keller, E; Marinaro, M, BERLIN: SPRINGER-VERLAG BERLIN , 2007, Vol. 4775, s. 250-263Konferensbidrag (Refereegranskat)
    Abstract [en]

    The use of animated talking agents is a novel feature of many multimodal spoken dialogue systems. The addition and integration of a virtual talking head has direct implications for the way in which users approach and interact with such systems. However, understanding the interactions between visual expressions, dialogue functions and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is closely related to the speech acoustics, while there are other articulatory movements affecting speech acoustics that are not visible on the outside of the face. Many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. This chapter looks into the communicative function of the animated talking agent, and its effect on intelligibility and the flow of the dialogue.

  • 114.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Focal accent and facial movements in expressive speech2006Ingår i: Proceedings from Fonetik 2006, Lund, June, 7-9, 2006 / [ed] Gilbert Ambrazaitis, Susanne Schötz, Lund: Lund University , 2006, s. 9-12Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systetnatically varied. The utterances were recorded in a variety of expressive modes including Certain, Confirming,Questioning, Uncertain, Happy, Angry and Neutral. Results showed that in all expressive modes, words with focal accent are accompanied by a greater variation of the facial parameters than are words in non-focal positions. Moreover, interesting differences between the expressions in terms of different parameters were found.

  • 115.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Visual correlates to prominence in several expressive modes2006Ingår i: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, s. 1272-1275Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systematically varied. The utterances were recorded in a variety of expressive modes including certain, confirming, questioning, uncertain, happy, angry and neutral. Results showed that in all expressive modes, words with focal accent are accompanied by a greater variation of the facial parameters than are words in non-focal positions. Moreover, interesting differences between the expressions in terms of different parameters were found.

  • 116.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Nordqvist, Peter
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Salvi, Giampiero
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Herzke, Tobias
    Schulz, Arne
    Hearing at Home: Communication support in home environments for hearing impaired persons2008Ingår i: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008, s. 2203-2206Konferensbidrag (Refereegranskat)
    Abstract [en]

    The Hearing at Home (HaH) project focuses on the needs of hearing-impaired people in home environments. The project is researching and developing an innovative media-center solution for hearing support, with several integrated features that support perception of speech and audio, such as individual loudness amplification, noise reduction, audio classification and event detection, and the possibility to display an animated talking head providing real-time speechreading support. In this paper we provide a brief project overview and then describe some recent results related to the audio classifier and the talking head. As the talking head expects clean speech input, an audio classifier has been developed for the task of classifying audio signals as clean speech, speech in noise or other. The mean accuracy of the classifier was 82%. The talking head (based on technology from the SynFace project) has been adapted for German, and a small speech-in-noise intelligibility experiment was conducted where sentence recognition rates increased from 3% to 17% when the talking head was present.

  • 117.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Experiments with Synthesis of Swedish Dialects2009Ingår i: Proceedings of Fonetik 2009 / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 28-29Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We describe ongoing work on synthesizing Swedish dialects with an HMM synthesizer. A prototype synthesizer has been trained on alarge database for standard Swedish read by a professional male voice talent. We have selected a few untrained speakers from each ofthe following dialectal region: Norrland, Dala,Göta, Gotland and South of Sweden. The planis to train a multi-dialect average voice, and then use 20-30 minutes of dialectal speech from one speaker to adapt either the standard Swedish voice or the average voice to the dialect of that speaker.

  • 118.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Nordenberg, Mikael
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Data-driven synthesis of expressive visual speech using an MPEG-4 talking head2005Ingår i: 9th European Conference on Speech Communication and Technology, Lisbon, 2005, s. 793-796Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes initial experiments with synthesis of visual speech articulation for different emotions, using a newly developed MPEG-4 compatible talking head. The basic problem with combining speech and emotion in a talking head is to handle the interaction between emotional expression and articulation in the orofacial region. Rather than trying to model speech and emotion as two separate properties, the strategy taken here is to incorporate emotional expression in the articulation from the beginning. We use a data-driven approach, training the system to recreate the expressive articulation produced by an actor while portraying different emotions. Each emotion is modelled separately using principal component analysis and a parametric coarticulation model. The results so far are encouraging but more work is needed to improve naturalness and accuracy of the synthesized speech.

  • 119.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Peters, Christopher
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Castellano, G.
    O'Sullivan, C.
    Leite, Iolanda
    KTH, Skolan för datavetenskap och kommunikation (CSC), Robotik, perception och lärande, RPL.
    Kopp, S.
    Preface2017Ingår i: 17th International Conference on Intelligent Virtual Agents, IVA 2017, Springer, 2017, Vol. 10498, s. V-VIKonferensbidrag (Refereegranskat)
  • 120.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Salvi, Giampiero
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    SynFace: Verbal and Non-verbal Face Animation from Audio2009Ingår i: Proceedings of The International Conference on Auditory-Visual Speech Processing AVSP'09 / [ed] Barry-John Theobald, Richard Harvey, Norwich, England, 2009Konferensbidrag (Refereegranskat)
    Abstract [en]

    We give an overview of SynFace, a speech-drivenface animation system originally developed for theneeds of hard-of-hearing users of the telephone. Forthe 2009 LIPS challenge, SynFace includes not onlyarticulatory motion but also non-verbal motion ofgaze, eyebrows and head, triggered by detection ofacoustic correlates of prominence and cues for interactioncontrol. In perceptual evaluations, both verbaland non-verbal movmements have been found to havepositive impact on word recognition scores.

  • 121.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Stefanov, Kalin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Web-enabled 3D Talking Avatars Based on WebGL and HTML52013Konferensbidrag (Refereegranskat)
  • 122. Biadsy, F.
    et al.
    Rosenberg, A.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hirschberg, J.
    Strangert, E.
    A Cross-Cultural Comparison of American, Palestinian, and Swedish2008Ingår i: Speech Prosody 2008, Campinas, Brazil, 2008Konferensbidrag (Refereegranskat)
    Abstract [en]

    Perception of charisma, the ability to influence others by virtueof one’s personal qualities, appears to be influenced to someextent by cultural factors. We compare results of five studies of charisma speech in which American, Palestinian, andSwedish subjects rated Standard American English politicalspeech and Americans and Palestinians rated Palestinian Arabic speech. We identify acoustic-prosodic and lexical featurescorrelated with charisma ratings of both languages for nativeand non-native speakers and find that 1) some acoustic-prosodicfeatures correlated with charisma ratings appear similar acrossall five experiments; 2) other acoustic-prosodic and lexical features correlated with charisma appear specific to the languagerated, whatever the native language of the rater; and 3) stillother acoustic-prosodic cues appear specific to both rater nativelanguage and to language rated. We also find that, while theabsolute ratings non-native raters assign tend to be lower thanthose of native speakers, the ratings themselves are strongly correlated.

  • 123. Biadsy, F.
    et al.
    Rosenberg, A.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Hirschberg, J.
    Strangert, E.
    A cross-cultural comparison of American, Palestinian, and Swedish perception of charismatic speech2008Ingår i: Proceedings of the 4th International Conference on Speech Prosody, SP 2008, International Speech Communications Association , 2008, s. 579-582Konferensbidrag (Refereegranskat)
    Abstract [en]

    Perception of charisma, the ability to influence others by virtue of one's personal qualities, appears to be influenced to some extent by cultural factors. We compare results of five studies of charisma speech in which American, Palestinian, and Swedish subjects rated Standard American English political speech and Americans and Palestinians rated Palestinian Arabic speech. We identify acoustic-prosodic and lexical features correlated with charisma ratings of both languages for native and non-native speakers and find that 1) some acoustic-prosodic features correlated with charisma ratings appear similar across all five experiments; 2) other acoustic-prosodic and lexical features correlated with charisma appear specific to the language rated, whatever the native language of the rater; and 3) still other acoustic-prosodic cues appear specific to both rater native language and to language rated. We also find that, while the absolute ratings non-native raters assign tend to be lower than those of native speakers, the ratings themselves are strongly correlated.

  • 124. Bimbot, F
    et al.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Boves, L
    Chollet, G
    Jaboulet, C
    Jacob, B
    Kharroubi, J
    Koolwaaij, J
    Lindberg, J
    Mariethoz, J
    Mokbel, C
    Mokbel, H
    An overview of the PICASSO project research activities in speaker verification for telephone applications1999Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents a general overview of the current research activities in the European PICASSO project on speaker verification for telephone applications. First, the general formalism used by the project is described. Then the scientific issues under focus are discussed in detail. Finally, the paper briefly describes the Picassoft research platform. Along the article, entry points to more specific work also published in the Eurospeech’99 proceedings are given.

  • 125. Bimbot, F
    et al.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Boves, L
    Genoud, D
    Hutter, H-P
    Jaboulet, C
    Koolwaaij, J
    Lindberg, J
    Pierrot, J-B
    An overwiev of the CAVE project research activities in speaker verification2000Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 31, nr 2-3, s. 155-180Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This article presents an overview of the research activities carried out in the European CAVE project, which focused on text-dependent speaker verification on the telephone network using whole word Hidden Markov Models. It documents in detail various aspects of the technology and the methodology used within the project. In particular, it addresses the issue of model estimation in the context of limited enrollment data and the problem of a posteriori decision threshold setting. Experiments are carried out on the realistic telephone speech database SESP. State-of-the-art performance levels are obtained, which validates the technical approaches developed and assessed during the project as well as the working infrastructure which facilitated cooperation between the partners.

  • 126. Bisesi, Erica
    et al.
    Parncutt, Richard
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    An accent-based approach to performance rendering: Music theory meets music psychology2011Ingår i: Proceedings of ISPS 2011 / [ed] Aaron Williamon, Darryl Edwards, and Lee Bartel, Utrecht: the European Association of Conservatoires (AEC) , 2011, s. 27-32Konferensbidrag (Refereegranskat)
    Abstract [en]

    Accents are local events that attract a listener’s attention and are either evident from the score (immanent) or added by the performer (performed). Immanent accents are associated with grouping, meter, melody, and harmony. In piano music, performed accents involve changes in timing, dynamics, articulation, and pedaling; they vary in amplitude, form, and duration. Performers tend to “bring out” immanent accents by means of performed accents, which attracts the listener’s attention to them. We are mathematically modeling timing and dynamics near immanent accents in a selection of Chopin Preludes using an extended version of Director Musices(DM),a software package for automatic rendering of expressive performance. We are developing DM in a new direction, which allows us to relate expressive features of a performance not only to global or intermediate structural properties, but also accounting for local events.

  • 127. Bisitz, T.
    et al.
    Herzke, T.
    Zokoll, M.
    Öster, Anne-Marie
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Ormel, E.
    Van Son, N.
    Tanke, R.
    Noise Reduction for Media Streams2009Ingår i: NAG/DAGA'09 International Conference on Acoustics: including the 35th German Annual Conference on Acoustics (DAGA) / [ed] Marinus M. Boone, Red Hook, USA: Curran Associates, Inc., 2009Konferensbidrag (Refereegranskat)
  • 128. Bissiri, M.P.
    et al.
    Zellers, Margaret
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Perception of glottalization in varying pitch contexts across languages2013Ingår i: INTERSPEECH-2013, 2013, s. 253-257Konferensbidrag (Refereegranskat)
    Abstract [en]

    Glottalization is often associated with low pitch in intonation languages, but evidence from many languages indicates that this is not an obligatory association. We asked speakers of German, English and Swedish to compare glottalized stimuli with several pitch contour alternatives in an AXB listening test. Although the low F0 in the glottalized stimuli tended to be perceived as most similar to falling pitch contours, this was not always the case, indicating that pitch perception in glottalization cannot be predicted by F0 alone. We also found evidence for cross-linguistic differences in the degree of flexibility of pitch judgments in glottalized stretches of speech.

  • 129. Bissiri, M.P.
    et al.
    Zellers, Margaret
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Ding, H.
    Perception of glottalization in varying pitch contexts in Mandarin Chinese2014Ingår i: Proceedings of Speech Prosody 7 / [ed] Campbell, N.; Gibbon, D.; Hirst, D., 2014, s. 633-637Konferensbidrag (Refereegranskat)
    Abstract [en]

    Although glottalization has often been associated with lowpitch, evidence from a number of sources supports theassertion that this association is not obligatory, and is likely tobe language-specific. Following a previous study testingperception of glottalization by German, English, and Swedishlisteners, the current research investigates the influence ofpitch context on the perception of glottalization by nativespeakers of a tone language, Mandarin Chinese. Listenersheard AXB sets in which they were asked to match glottalizedstimuli with pitch contours. We find that Mandarin listenerstend not to be influenced by the pitch context when judgingthe pitch of glottalized stretches of speech. These data lendsupport to the idea that the perception of glottalization variesin relation to language-specific prosodic structure.

  • 130.
    Bjurling, Johan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Bresin, Roberto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Timing in piano music: Testing a model of melody lead2008Ingår i: Proc. of the 10th International Conference on Music Perception and Cognition, Sapporo, Japan, 2008Konferensbidrag (Refereegranskat)
  • 131. Björklund, Staffan
    et al.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH. University College of Music Education, Sweden.
    Relationship Between Subglottal Pressure and Sound Pressure Level in Untrained Voices2016Ingår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 30, nr 1, s. 15-20Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Objectives Subglottal pressure (Ps) is strongly correlated with sound pressure level (SPL) and is easy to measure by means of commonly available equipment. The SPL/Ps ratio is strongly dependent on the efficiency of the phonatory apparatus and should be of great relevance to clinical practice. However, published normative data are still missing. Method The subjects produced sequences of the syllable [pæ], and Ps was measured as the oral pressure during the [p] occlusion. The Ps to SPL relationship was determined at four pitches produced by 16 female and 15 male healthy voices and analyzed by means of regression analysis. Average correlation between Ps and SPL, average SPL produced with a Ps of 10 cm H2O, and average SPL increase produced by a doubling of Ps were calculated for the female and for the male subjects. The significance of sex and pitch conditions was analyzed by means of analysis of variance (ANOVA). Results Pitch was found to be an insignificant condition. The average correlation between Ps and SPL was 0.83 and did not differ significantly between the female and male subjects. In female and male subjects, Ps = 10 cm H2O produced 78.1 dB and 80.0 dB SPL at 0.3 m, and a doubling of Ps generated 11.1 dB and 9.3 dB increase of SPL. Both these gender differences were statistically significant. Conclusions The relationship between Ps and SPL can be reliably established from series of repetitions of the syllable [pæ] produced with a continuously changing degree of vocal loudness. Male subjects produce slightly higher SPL for a given pressure than female subjects but gain less for a doubling of Ps. As these relationships appear to be affected by phonation type, it seems possible that in the future, the method can be used for documenting degree of phonatory hypofunction and hyperfunction.

  • 132.
    Björkman, Beyza
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    English as the lingua franca of engineering: the morphosyntax of academic speech events2008Ingår i: Nordic Journal of English Studies, ISSN 1654-6970, E-ISSN 1654-6970, Vol. 7, nr 3, s. 103-122Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    English today is frequently used as an international means of communication among its non-native speakers from different L1 backgrounds. Research on English as a lingua franca (ELF) has already revealed commonalities and common processes from a variety of settings. It is important that research continues and that lingua franca usage in different environments is described to find ways to optimize communication. This paper will focus on the morphosyntax of spoken ELF, reporting the results of a study that investigates spoken lingua franca English in tertiary education (engineering) in Sweden, where English is increasingly becoming the language of instruction. The morphosyntax of non-native-like usage is investigated in dialogic and monologic speech events. Cases of non-native-like usage are grouped as ‘disturbing’, i.e. causing comprehension problems and ‘non-disturbing’, i.e. causing no comprehension problems. Findings from this corpus-based study show that the most consistent idiosyncrasies in lingua franca usage in this setting are observed in redundant features of the language and that there is very little disturbance, i.e. breakdown in communication. Engineers seem to opt for function and reciprocal intelligibility over redundant features of the language and accuracy when they speak English in academic contexts.

  • 133.
    Björkman, Beyza
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    "So you think you can ELF?": English as a lingua franca as the medium of instruction2010Ingår i: Hermes - Journal of Language and Communication Studies, ISSN 0904-1699, E-ISSN 1903-1785, Vol. 45, s. 77-99Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper reports the findings of a study on spoken English as a lingua franca (ELF) in Swedish higher education. The aim has been to investigate the role pragmatic strategies play in content lectures where English is a lingua franca, i.e. a vehicular language. The findings show that lecturers in ELF settings make less frequent use of pragmatic strategies than students, who deploy these strategies frequently in group-work projects. Earlier stages of the present study showed that despite frequent non-standardness at the morphosyntax level, there is very little overt disturbance in student group-work (Björkman 2008 a and b/2009b), most likely owing to a variety of communicative strategies used during interaction and the questions raised (Björkman, 2009a). It seems reasonable to assume that, in the absence of appropriate strategies and questions that serve as real-time signals of disturbance, there is an increased risk for covert disturbance in lectures. This view complies with the findings of earlier studies on the importance of such strategies (Mauranen 2006, Airey 2009:79, Hellekjær 2010). The findings imply that the effectiveness of a speaker of English in academic ELF settings is determined primarily by the speaker’s pragmatic ability and less by his/her proficiency. There are important implications of these findings for lecturers who need to operate in ELF settings. First, increasing interactivity by using pragmatic strategies sufficiently frequently appears critical for those involved in English-medium education. It is also important that awareness is raised on target language usage in lecturing in English. Such awareness-raising can be achieved at the macro level by clearly-written language policies that include training for teachers and students who both need to be equipped with the skills needed to cope with the complexities of such settings, and at the micro level, by in-house training and courses that could be administered to both teachers and students.

  • 134.
    Björkner, Eva
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Musical theater and opera singing - Why so different?: A study of subglottal pressure, voice source, and formant frequency characteristics2008Ingår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 22, nr 5, s. 533-540Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The considerable voice timbre differences between musical theater (MT) and western operatic singers are analyzed with respect to voice source and formant frequencies in five representatives of each singer group. Audio, subglottal pressure (P(sub)), and electroglottograph (EGG) signals were recorded while the subjects sang a sequence of [pae:] syllables starting at maximal vocal loudness and then gradually decreasing vocal loudness. The task was performed at each of two fundamental frequencies (F(0)), approximately one octave apart. Ten equally spaced P(sub) values were then selected for each F(0). The subsequent vowels were analyzed in terms of flow glottograms derived by inverse filtering the audio signal, which also yielded formant frequency data. Period time (T(0)), peak-to-peak pulse amplitude (U(p-t-p)), and maximum flow declination rate (MFDR) were measured from the flow glottograms while closed quotient Q(closed) (T(cl)/T(0)) was determined in combination with the differentiated EGG signal. Also the relationship between the first and the second harmonic in the spectrum (H(1)-H(2)), the amplitude quotient (AQ), that is, the ratio between U(p-t-p) and MFDR, and normalized AQ, that is, AQ normalized with respect to period time was calculated as well as the sound pressure level. The results showed that both the MT and the opera singers varied their P(sub) systematically, approximately doubling P(sub) for a doubling of F(0). For a given value of P(sub), the MT singers produced higher values of MFDR, U(p-t-p), and Q(closed), and lower values of H(1)-H(2), indicating a weaker fundamental. Further, the MT singers showed higher formant frequencies and did not show the opera singers' characteristic clustering of F(3), F(4), and F(5).

  • 135.
    Björkner, Eva
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Why so different? - Aspects of voice characteristics in operatic and musical theatre singing: Aspects of voice characteristics in operatic and musical theatre singing2006Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    This thesis addresses aspects of voice characteristics in operatic and musical theatre singing. The common aim of the studies was to identify respiratory, phonatory and resonatory characteristics accounting for salient voice timbre differences between singing styles.

    The velopharyngeal opening (VPO) was analyzed in professional operatic singers, using nasofiberscopy. Differing shapes of VPOs suggested that singers may use a VPO to fine-tune the vocal tract resonance characteristics and hence voice timbre. A listening test revealed no correlation between rated nasal quality and the presence of a VPO.

    The voice quality referred to as “throaty”, a term sometimes used for characterizing speech and “non-classical” vocalists, was examined with respect to subglottal pressure (Psub) and formant frequencies. Vocal tract shapes were determined by magnetic resonance imaging. The throaty versions of four vowels showed a typical narrowing of the pharynx. Throatiness was characterized by increased first formant frequency and lowering of higher formants. Also, voice source parameter analyses suggested a hyper-functional voice production.

    Female musical theatre singers typically use two vocal registers (chest and head). Voice source parameters, including closed-quotient, peak-to-peak pulse amplitude, maximum flow declination rate, and normalized amplitude quotient (NAQ), were analyzed at ten equally spaced subglottal pressures representing a wide range of vocal loudness. Chest register showed higher values in all glottal parameters except for NAQ. Operatic baritone singer voices were analyzed in order to explore the informative power of the amplitude quotient (AQ), and its normalized version NAQ, suggested to reflect glottal adduction. Differences in NAQ were found between fundamental frequency values while AQ was basically unaffected.

    Voice timbre differs between musical theatre and operatic singers. Measurements of voice source parameters as functions of subglottal pressure, covering a wide range of vocal loudness, showed that both groups varied Psub systematically. The musical theatre singers used somewhat higher pressures, produced higher sound pressure levels, and did not show the opera singers’ characteristic clustering of higher formants.

    Musical theatre and operatic singers show highly controlled and consistent behaviors, characteristic for each style. A common feature is the precise control of subglottal pressure, while laryngeal and vocal tract conditions differ between singing styles. In addition, opera singers tend to sing with a stronger voice source fundamental than musical theatre singers.

  • 136.
    Björkner, Eva
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Alku, P.
    Subglottal pressure and NAQ variation in voice production of classically trained baritone singers2005Ingår i: 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 2005, s. 1057-1060Konferensbidrag (Refereegranskat)
    Abstract [en]

    The subglottal pressure (Ps) and voice source characteristics of five professional baritone singers were analyzed. Glottal adduction was estimated with amplitude quotient (AQ), defined as the ratio between peak-to-peak pulse amplitude and the negative peak of the differentiated flow glottogram, and with normalized amplitude quotient (NAQ), defined as AQ divided by fundamental period length. Previous studies show that NAQ and its variation with Ps represent an effective parameter in the analysis of voice source characteristics. Therefore, the present study aims at increasing our knowledge of these two parameters further by finding out how they vary with pitch and Ps in operatic baritone singers, singing at high and low pitch. Ten equally spaced Ps values were selected from three takes of the syllable [pae], repeated with a continuously decreasing vocal loudness and initiated at maximum vocal loudness. The vowel sounds following the selected Ps peaks were inverse filtered. Data on peak-to-peak pulse amplitude, maximum flow declination rate, AQ and NAQ will be presented.

  • 137.
    Björkner, Eva
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Sundberg, Johan
    Alku, Paavo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Subglottal Pressure and Normalized Amplitude Quotient Variation in Classically Trained Baritone Singers2006Ingår i: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 31, nr 4, s. 157-165Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The subglottal pressure (Ps) and voice source characteristics of five professional baritone singers have been analyzed and the normalized amplitude quotient (NAQ), defined as the ratio between peak-to-peak pulse amplitude and the negative peak of the differentiated flow glottogram and normalized with respect to the period time, was used as an estimate of glottal adduction. The relationship between Ps and NAQ has been investigated in female subjects in two earlier studies. One of these revealed NAQ differences between both singing styles and phonation modes, and the other, based on register differences in female musical theatre singers, showed that NAQ differed between registers for the same PPs value. These studies thus suggest that NAQ and its variation with PPs represent a useful parameter in the analysis of voice source characteristics. The present study aims at increasing our knowledge of the NAQ parameter further by finding out how it varies with pitch and PPs in professional classically trained baritone singers, singing at high and low pitch (278 Hz and 139 Hz, respectively). Ten equally spaced Ps values were selected from three takes of the syllable [pae:], initiated at maximum vocal loudness and repeated with a continuously decreasing vocal loudness. The vowel sounds following the selected PPs peaks were inverse filtered. Data on peak-to-peak pulse amplitude, maximum flow declination rate and NAQ are presented.

  • 138.
    Björkner, Eva
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Cleveland, T
    Stone, E
    Voice source differences between registers in female musical theater singers2006Ingår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 20, nr 2, s. 187-197Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Musical theater singing typically requires women to use two vocal registers. Our investigation considered voice source and subglottal pressure P-s characteristics of the speech pressure signal recorded for a sequence of /pae/ syllables sung at constant pitch and decreasing vocal loudness in each register by seven female musical theater singers. Ten equally spaced P-s values were selected, and the relationships between P-s and several parameters were examined; closed-quotient (Q(closed)), peak-to-peak pulse amplitude (Up-t-p), amplitude of the negative peak of the differentiated flow glottogram. ie, the maximum flow declination rate (MFDR), and the normalized amplitude quotient (NAQ) [Up-t-p/(TO*MFDR)], where TO is the fundamental period. P, was typically slightly higher in chest than in head register. As P, influences the measured glottogram parameters, these were also compared at an approximately identical P-s of 11 cm H2O. Results showed that for typical tokens, MFDR and Q(closed) were significantly greater, whereas Up-t-p and therefore NAQ were significantly lower in chest than in head.

  • 139.
    Björkner, Eva
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Sundberg, Johan
    Cleveland, Tom
    Stone, Ed
    Voice Source Differences between Registers in Female Musical Theatre Singers2006Ingår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 20, s. 187-197Artikel i tidskrift (Refereegranskat)
  • 140.
    Björkner, Eva
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Cleveland, Tom
    Vanderbilt Voice Center, Dept. of Otolaryngology, Vanderbilt University Medical Center, Nashville.
    Stone, R E
    Vanderbilt Voice Center, Dept. of Otolaryngology, Vanderbilt University Medical Center, Nashville.
    Voice source characteristics in different registers in classically trained female musical theatre singers2004Ingår i: Proceedings of ICA 2004 : the 18th International Congress on Acoustics, Kyoto International Conference Hall, 4-9 April, Kyoto, Japan: acoustical science and technology for quality of life, Kyoto, Japan, 2004, s. 297-300Konferensbidrag (Refereegranskat)
    Abstract [en]

    Musical theatre singing requires the use of twovocal registers in the female voice. The voice source and subglottal pressure Pscharacteristics of these registers are analysed by inverse filtering. The relationship between Psand closed quotient Qclosed, peak-to-peak pulse amplitude Up-t-p, maximum flow declination rate MFDR and the normalised amplitude quotient NAQ were examined. Pswastypically slightly higher in chest than in head register . For typical tokens MFDR and Qclosed were significantly greater while NAQ and Up-t-p were significantly lower in chest than in head.

  • 141.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    A common phone model representation for speech recognition and synthesis1994Konferensbidrag (Refereegranskat)
    Abstract [en]

    A combined representation of context-dependent phones at the production parametric and the spectral level is described. The phones are trained in the production domain using analysis-by-synthesis and piece-wise linear approximation of parameter trajectories. For recognition, this representation is transformed to spectral subphones, using a cascade formant synthesis procedure. In a connected-digit recognition task, 99.1% average correct digit rate was achieved in a group of seven male speakers when, for each test speaker, training was done on the other six speakers. Simple rules for male-to-female transformation of the male phone library increased the performance for six female speakers from 88.9% without transformation to 96.3%. In informal listening tests of resynthesised digit strings, the speech has been judged as intelligible, however far from natural.

     

  • 142.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Collection and recognition of children s speech in the PF-Star project2003Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper reports on the recording and planned research activities on recognition of children’s speech in the EU-project PF-Star. The task is quite more difficult than recognition of adult speech for several reasons. High fundamental frequency and formant frequencies change the spectral shape of the speech signal. Also the pronunciation and the use of language differs from adult speech. One objective in PF-Star is to collect speech data for the project partners’ languages and to detect and analyse major difficulties.Possible ways of reducing these problems will be explored.

  • 143.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Creating unseen triphones by phone concatenation in the spectral, cepstral and formant domains1997Konferensbidrag (Refereegranskat)
    Abstract [en]

    A technique for predicting triphones by concatenation of diphone or monophone models is studied. The models are connected using linear interpolation between endpoints of piece-wise linear parameter trajectories. Three types of spectral representation are compared: formants, filter amplitudes and cepstmm coefficients. The proposed technique lowers the spectral distortion of the phones for all three representations when different speakers are used for training and evaluation. The average error of the created triphones is lower in the filter and cepstmm domains than for formants. This is explained to be caused by limitations in the Analysis-bySynthesis formant tracking algorithm. A small improvement with the proposed technique is achieved for all representations in the task of reordering N-best sentence recognition candidate lists.

  • 144.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Creating unseen triphones by phone concatenation of diphones and monophones in the spectral, cepstral and formant domains1997Konferensbidrag (Refereegranskat)
  • 145.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Creation of unseen triphones from seen triphones, diphones and phones1996Ingår i: TMH-QPSR, Vol. 37, nr 2, s. 113-116Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    With limited training data, infrequent triphone models for speech recognition will not be observed in suficient number. In this report, a speech production approach is used to predict the characteristics of unseen triphones by using a transformation technique in the parametric representation of a formant speech synthesiser. Two techniques are currently tested. In one approach, unseen triphones are created by concatenating monophones and diphones and interpolating the parameter trajectories across the connection points. The second technique combines information from two similar triphones; one with correct context and one with correct midphone identity. Preliminary experiments are performed in the task of rescoring recognition candidates in an N-best list.

  • 146.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Model space size scaling for speaker adaptation2011Ingår i: Proceedings of Fonetik 2011, Stockholm: KTH Royal Institute of Technology, 2011, Vol. 51, nr 1, s. 77-80Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In the current work, instantaneous adaptation in speech recognition is performedby estimating speaker properties, which modify the original trained acousticmodels. We introduce a new property, the size of the model space, which isincluded to the previously used features, VTLN and spectral slope. These arejointly estimated for each test utterance. The new feature has shown to be effectivefor recognition of children’s speech using adult-trained models in TIDIGITS.Adding the feature lowered the error rate by around 10% relative. The overallcombination of VTLN, spectral slope and model space scaling represents asubstantial 31% relative reduction compared with single VTLN. There was noimprovement among adult speakers in TIDIGITS and in TIMIT. Improvement forthis speaker category is expected when the training and test sets are recorded indifferent conditions, such as read and spontaneous speech.

  • 147.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
     Phoneme recognition for the hearing impaired2002Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes an automatic speech recognition system designed to investigate the use of phoneme recognition as a hearing aid in telephone communication. The system was tested in two experiments. The first involved 19 normal hearing subjects with a simulated severe hearing impairment. The second involved 5 hearing impaired subjects. In both studies we used a procedure called Speech Tracking to measure the effective communication speed between two persons. A substantial improvement was found in both cases.

  • 148.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Speech recognition using long-distance relations in an utterance1998Konferensbidrag (Refereegranskat)
  • 149.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Training production parameters of context-dependent phones for speech recognition1994Ingår i: STL-QPSR, Vol. 35, nr 1, s. 59-90Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A representation form of acoustic information in a trained phone library at the production parametric as well as the spectral level is described. The phones are trained in the parametric domain and are transformed to the spectral domain by means of a synthesis procedure. By this twofold description, potentially more powerful procedures for speaker adaptation and generation of unseen triphones can be explored, while the more robust spectral representation can be used for recognition. Context-dependent phones are represented by control parameters to a cascade formant synthesiser. During training, the parameters are extracted using an analysis-by-synthesis technique and the trajectories are approximated by piece-wise linear segments. For recognition, the parameter tracks are transformed to a sequence of spectral subphone states, similar to a Hidden Markov model. Recognition is performed by Viterbi search in a finitestate network. Recognition experiments have been performed on Swedish connected-digit strings pronounced by seven male speakers. In one experiment, unseen triphones were created by concatenating monophones and diphones and interpolating the parameter trajectories between line endpoints. In another, speaker adaptation was based on generalisation of dzflerences of observed triphones from the phone library. With optimum weighting of duration information, the results for cross-speaker recognition, speaker adaptation, and multi-speaker training were 98.5%, 98.9% and 99.1% correct digit recognition, respectively. Preliminary experiments with created unseen triphones show no improvement. In informal listening tests of resynthesised digit strings from concatenation of trained triphones, the speech has been judged as intelligible, however, far from natural.

  • 150.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Training speech synthesis parameters of allophones for speech recognition1994Konferensbidrag (Refereegranskat)
    Abstract [en]

    A technique for training a speech recognition system at a production parametric level is described. The approach offers potential advantages in the form of small training corpora and fast speaker adaptation. Triphones that have not occurred in the training data can be generated by concatenation and parametric interpolation of diphones or context-free phones. The triphones are represented by a piece-wise linear approximation of the production parameters. For recognition, these are converted to subphone spectral state sequences. A 97.6% connected-digit recognition rate has been achieved when training the system on one male speaker and performing recognition on 6 other male speakers. In preliminary experiments with generation of unseen triphones, the performance is still slightly lower compared to using seen diphones and context-free phones. Experiments with fast speaker adaptation is also going on. Resynthesis of speech by concatenating triphones has been used to verify the quality of the triphone library.

1234567 101 - 150 av 1064
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf