Ändra sökning
Avgränsa sökresultatet
1234567 151 - 200 av 692
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 151.
    Dalianis, Hercules
    et al.
    Dept of Computer and System Sciences, Stockholm Univ, Sweden.
    Rimka, Martin
    Dept of Computer and System Sciences, Stockholm Univ, Sweden.
    Kann, Viggo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian2009Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents how we adapted awebsite search engine for cross languageinformation retrieval, using theUplug word alignment tool for parallelcorpora.We first studied the monolingualsearch queries posed by the visitors ofthe website of the Nordic council containingfive different languages. In orderto compare how well different types ofbilingual dictionaries covered the mostcommon queries and terms on the websitewe tried a collection of ordinary bilingualdictionaries, a small manuallyconstructed trilingual dictionary and anautomatically constructed trilingual dictionary,constructed from the news corpusin the website using Uplug. The precisionand recall of the automaticallyconstructed Swedish-English dictionaryusing Uplug were 71 and 93 percent, respectively.We found that precision andrecall increase significantly in sampleswith high word frequency, but we couldnot confirm that POS-tags improve precision.The collection of ordinary dictionaries,consisting of about 200 000words, only cover 41 of the top 100search queries at the website. The automaticallybuilt trilingual dictionary combinedwith the small manually built trilingualdictionary, consisting of about2 300 words, and cover 36 of the topsearch queries.

  • 152.
    Dalianis, Hercules
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT).
    Xing, Haochun
    KTH, Skolan för informations- och kommunikationsteknik (ICT).
    Xin, Z.
    Creating a reusable English-Chinese parallel corpus for bilingual dictionary construction2010Ingår i: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010, European Language Resources Association (ELRA) , 2010, s. 1700-1705Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually translated from Chinese to English. The parallel corpus contains 104 563 Chinese characters equivalent to 59 918 Chinese words, and the corresponding English corpus contains 75 766 English words. However Chinese writing does not utilize any delimiters to mark word boundaries so we had to carry out word segmentation as a preprocessing step on the Chinese corpus. Moreover since the parallel corpus is downloaded from Internet the corpus is noisy regarding to alignment between corresponding translated sentences. Therefore we used 60 hours of manually work to align the sentences in the English and Chinese parallel corpus before performing automatic word alignment using Uplug. The word alignment with Uplug was carried out from English to Chinese. Nine respondents evaluated the resulting English-Chinese word list with frequency equal to or above three and we obtained an accuracy of 73.1 percent.

  • 153.
    Dravins, Christina
    et al.
    The National Agency for Special Needs Education and Schools.
    van Besouw, Rachel
    ISVR, University of Southampton.
    Hansen, Kjetil Falkenberg
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Kuske, Sandra
    Latvian Children's Hearing Centre.
    Exploring and enjoying non-speech sounds through a cochlear implant: the therapy of music2010Ingår i: 11th International Conference on Cochlear Implants and other Implantable Technologies, Karolinska University Hospital, 2010, s. 356-Konferensbidrag (Refereegranskat)
    Abstract [en]

    Cochlear implant technology was initially designed to promote reception ofspeech sounds; however, music enjoyment remains a challenge. Music is aninfluential ingredient in our well-being, playing an important role in ourcognitive, physical and social development. For many cochlear implantrecipients it is not feasible to communicate how sounds are perceived, andconsequently the benefits of music listening may be reduced. Non-speechsounds may also be important to persons with multiple functional deficitsthat relay on information additional to verbatim for participating incommunication. Deaf-born children with multiple functional deficitsconstitute a special vulnerable group as lack of reaction to sound oftenis discouraging to caregivers. Individually adapted tools and methods forsound awareness may promote exploration and appreciation of theinformation mediated by the implant.Two current works involving habilitation through sound production andmusic will be discussed. First, the results from a pilot study aiming atfinding musical toys that can be adapted to help children explore theirhearing with engaging sounds and expressive interfaces will be presented.The findings indicate that children with multiple functional deficits canbe more inclined to use the auditory channel for communication and playthan the caregivers would anticipate.Second, the results of a recent questionnaire study, which compared themusic exposure and appreciation of preschool cochlear implant recipientswith their normally hearing peers will be presented. The data from thisstudy indicate that preschool children with cochlear implants spendroughly the same amount of time interacting with musical instruments athome and watching television programmes and DVDs which include music.However, the data indicate that these children receive less exposure torecorded music without visual stimuli and show less sophisticatedresponses to music. The provision and supported use of habilitationmaterials which encourage interaction with music might therefore bebeneficial.

  • 154. Driesen, J.
    et al.
    Van Hamme, H.
    Kleijn, W. Bastiaan
    KTH, Skolan för elektro- och systemteknik (EES), Ljud- och bildbehandling (Stängd 130101).
    Learning from images and speech with non-negative matrix factorization enhanced by input space scaling2010Ingår i: 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings, IEEE , 2010, s. 1-6Konferensbidrag (Refereegranskat)
    Abstract [en]

    Computional learning from multimodal data is often done with matrix factorization techniques such as NMF (Non-negative Matrix Factorization), pLSA (Probabilistic Latent Semantic Analysis) or LDA (Latent Dirichlet Allocation). The different modalities of the input are to this end converted into features that are easily placed in a vectorized format. An inherent weakness of such a data representation is that only a subset of these data features actually aids the learning. In this paper, we first describe a simple NMF-based recognition framework operating on speech and image data. We then propose and demonstrate a novel algorithm that scales the inputs of this framework in order to optimize its recognition performance.

  • 155.
    Dubus, Gaël
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Evaluation of four models for the sonification of elite rowing2012Ingår i: Journal on Multimodal User Interfaces, ISSN 1783-7677, E-ISSN 1783-8738, Vol. 5, nr 3-4, s. 143-156Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Many aspects of sonification represent potential benefits for the practice of sports. Taking advantage of the characteristics of auditory perception, interactive sonification offers promising opportunities for enhancing the training of athletes. The efficient learning and memorizing abilities pertaining to the sense of hearing, together with the strong coupling between auditory and sensorimotor systems, make the use of sound a natural field of investigation in quest of efficiency optimization in individual sports at a high level. This study presents an application of sonification to elite rowing, introducing and evaluating four sonification models.The rapid development of mobile technology capable of efficiently handling numerical information offers new possibilities for interactive auditory display. Thus, these models have been developed under the specific constraints of a mobile platform, from data acquisition to the generation of a meaningful sound feedback. In order to evaluate the models, two listening experiments have then been carried out with elite rowers. Results show a good ability of the participants to efficiently extract basic characteristics of the sonified data, even in a non-interactive context. Qualitative assessment of the models highlights the need for a balance between function and aesthetics in interactive sonification design. Consequently, particular attention on usability is required for future displays to become widespread.

  • 156.
    Dubus, Gaël
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Bresin, Roberto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sonification of physical quantities throughout history: a meta-study of previous mapping strategies2011Ingår i: Proceedings of the 17th International Conference on Auditory Display (ICAD 2011), Budapest, Hungary: OPAKFI Egyesület , 2011Konferensbidrag (Refereegranskat)
    Abstract [en]

    We introduce a meta-study of previous sonification designs taking physical quantities as input data. The aim is to build a solid foundation for future sonification works so that auditory display researchers would be able to take benefit from former studies, avoiding to start from scratch when beginning new sonification projects. This work is at an early stage and the objective of this paper is rather to introduce the methodology than to come to definitive conclusions. After a historical introduction, we explain how to collect a large amount of articles and extract useful information about mapping strategies. Then, we present the physical quantities grouped according to conceptual dimensions, as well as the sound parameters used in sonification designs and we summarize the current state of the study by listing the couplings extracted from the article database. A total of 54 articles have been examined for the present article. Finally, a preliminary analysis of the results is performed.

  • 157.
    Dubus, Gaël
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Bresin, Roberto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sonification of sculler movements, development of preliminary methods2010Ingår i: Proceedings of ISon 2010, 3rd Interactive Sonification Workshop / [ed] Bresin, Roberto; Hermann, Thomas; Hunt, Andy, Stockholm, Sweden: KTH Royal Institute of Technology , 2010, s. 39-43Konferensbidrag (Refereegranskat)
    Abstract [en]

    Sonification is a widening field of research with many possibilitiesfor practical applications in various scientific domains. The rapiddevelopment of mobile technology capable of efficiently handlingnumerical information offers new opportunities for interactive auditorydisplay. In this scope, the SONEA project (SONification ofElite Athletes) aims at improving performances of Olympic-levelathletes by enhancing their training techniques, taking advantageof both the strong coupling between auditory and sensorimotorsystems, and the efficient learning and memorizing abilities pertainingthe sense of hearing. An application to rowing is presentedin this article. Rough estimates of the position and mean velocityof the craft are given by a GPS receiver embedded in a smartphonetaken onboard. An external accelerometer provides boatacceleration data with higher temporal resolution. The developmentof preliminary methods for sonifying the collected data hasbeen carried out under the specific constraints of a mobile deviceplatform. The sonification is either performed by the phone as areal-time feedback or by a computer using data files as input foran a posteriori analysis of the training. In addition, environmentalsounds recorded during training can be synchronized with thesonification to perceive the coherence of the sequence of soundsthroughout the rowing cycle. First results show that sonificationusing a parameter-mapping method over

  • 158.
    Dubus, Gaël
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Hansen, Kjetil Falkenberg
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Bresin, Roberto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    An overview of sound and music applications for Android available on the market2012Ingår i: Proceedings of the 9th Sound and Music Computing Conference, SMC 2012 / [ed] Serafin, Stefania, Sound and music Computing network , 2012, s. 541-546Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper introduces a database of sound-based applications running on the Android mobile platform. The longterm objective is to provide a state-of-the-art of mobile applications dealing with sound and music interaction. After exposing the method used to build up and maintain the database using a non-hierarchical structure based on tags, we present a classification according to various categories of applications, and we conduct a preliminary analysis of the repartition of these categories reflecting the current state of the database.

  • 159. Echternach, Matthias
    et al.
    Birkholz, Peter
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik. University College of Music Education, Stockholm, Sweden.
    Traser, Louisa
    Korvink, Jan Gerrit
    Richter, Bernhard
    Resonatory Properties in Professional Tenors Singing Above the Passaggio2016Ingår i: Acta Acoustica united with Acustica, ISSN 1610-1928, E-ISSN 1861-9959, Vol. 102, nr 2, s. 298-306Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Introduction: The question of formant tuning in male professional voices has been a matter of discussion for many years. Material and Methods: In this study four very successful Western classically trained tenors of different repertoire were analysed. They sang a scale on the vowel conditions /a,e,i,o,u/ from the pitch C4 (250 Hz) to A4 (440 Hz) in their stage voice avoiding a register shift to falsetto. Formant frequencies were calculated from inverse filtering of the audio signal and from two-dimensional MRI data. Results: Both estimations showed only for vowel conditions with low first formant (F1) a tuning F1 adjusted to the first harmonic. For other vowel conditions, however, no clear systematic formant tuning was observed. Conclusion: For most vowel conditions the data are not able to support the hypothesis of a systematic formant tuning for professional classically trained tenors.

  • 160. Echternach, Matthias
    et al.
    Doellinger, Michael
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Traser, Louisa
    Richter, Bernhard
    Vocal fold vibrations at high soprano fundamental frequencies2013Ingår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 133, nr 2, s. EL82-EL87Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Human voice production at very high fundamental frequencies is not yet understood in detail. It was hypothesized that these frequencies are produced by turbulences, vocal tract/vocal fold interactions, or vocal fold oscillations without closure. Hitherto it has been impossible to visually analyze the vocal mechanism due to technical limitations. Latest high-speed technology, which captures 20 000 frames/s, using transnasal endoscopy was applied. Up to 1568Hz human vocal folds do exhibit oscillations with complete closure. Therefore, the recent results suggest that human voice production at very high F0s up to 1568Hz is not caused by turbulence, but rather by airflow modulation from vocal fold oscillations. (C) 2013 Acoustical Society of America

  • 161.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    How deeply rooted are the turns we take?2011Ingår i: SemDial 2011: Proceedings of the 15th Workshop on the Semantics and Pragmatics of Dialogue, 2011, s. 196-197Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This poster presents preliminary work investigatingturn-taking in text-based chat with aview to learn something about how deeplyrooted turn-taking is in the human cognition.A connexion is shown between preferred turntakingpatterns and length and type of experiencewith such chats, which supports the ideathat the orderly type of turn-taking found inmost spoken conversations is indeed deeplyrooted, but not more so than that it can beovercome with training in a situation wheresuch turn-taking is not beneficial to the communication.

  • 162.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Co-present or Not?: Embodiment, Situatedness and the Mona Lisa Gaze Effect2013Ingår i: Eye gaze in intelligent user interfaces: gaze-based analyses, models and applications / [ed] Nakano, Yukiko; Conati, Cristina; Bader, Thomas, London: Springer London, 2013, s. 185-203Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    The interest in embodying and situating computer programmes took off in the autonomous agents community in the 90s. Today, researchers and designers of programmes that interact with people on human terms endow their systems with humanoid physiognomies for a variety of reasons. In most cases, attempts at achieving this embodiment and situatedness has taken one of two directions: virtual characters and actual physical robots. In addition, a technique that is far from new is gaining ground rapidly: projection of animated faces on head-shaped 3D surfaces. In this chapter, we provide a history of this technique; an overview of its pros and cons; and an in-depth description of the cause and mechanics of the main drawback of 2D displays of 3D faces (and objects): the Mona Liza gaze effect. We conclude with a description of an experimental paradigm that measures perceived directionality in general and the Mona Lisa gaze effect in particular.

  • 163.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    The Mona Lisa Gaze Effect as an Objective Metric for Perceived Cospatiality2011Ingår i: Proc. of the Intelligent Virtual Agents 10th International Conference (IVA 2011) / [ed] Vilhjálmsson, Hannes Högni; Kopp, Stefan; Marsella, Stacy; Thórisson, Kristinn R., Springer , 2011, s. 439-440Konferensbidrag (Refereegranskat)
    Abstract [en]

    We propose to utilize the Mona Lisa gaze effect for an objective and repeatable measure of the extent to which a viewer perceives an object as cospatial. Preliminary results suggest that the metric behaves as expected.

  • 164.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Tånnander, Christina
    Swedish Agency for Accessible Media, MTM, Stockholm, Sweden.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Audience response system based annotation of speech2013Ingår i: Proceedings of Fonetik 2013, Linköping: Linköping University , 2013, s. 13-16Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Manual annotators are often used to label speech. The task is associated with high costs and with great time consumption. We suggest to reach an increased throughput while maintaining a high measure of experimental control by borrowing from the Audience Response Systems used in the film and television industries, and demonstrate a cost-efficient setup for rapid, plenary annotation of phenomena occurring in recorded speech together with some results from studies we have undertaken to quantify the temporal precision and reliability of such annotations.

  • 165.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Tånnander, Christina
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Temporal precision and reliability of audience response system based annotation2013Ingår i: Proc. of Multimodal Corpora 2013, 2013Konferensbidrag (Refereegranskat)
  • 166.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Alexanderson, Simon
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustavsson, Lisa
    Heldner, Mattias
    (Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics) .
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Kallionen, Petter
    Marklund, Ellen
    3rd party observer gaze as a continuous measure of dialogue flow2012Ingår i: LREC 2012 - Eighth International Conference On Language Resources And Evaluation, Istanbul, Turkey: European Language Resources Association, 2012, s. 1354-1358Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present an attempt at using 3rd party observer gaze to get a measure of how appropriate each segment in a dialogue is for a speaker change. The method is a step away from the current dependency of speaker turns or talkspurts towards a more general view of speaker changes. We show that 3rd party observers do indeed largely look at the same thing (the speaker), and how this can be captured and utilized to provide insights into human communication. In addition, the results also suggest that there might be differences in the distribution of 3rd party observer gaze depending on how information-rich an utterance is.

  • 167.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Capturing massively multimodal dialogues: affordable synchronization and visualization2010Ingår i: Proc. of Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality (MMC 2010) / [ed] Kipp, Michael; Martin, Jean-Claude; Paggio, Patrizia; Heylen, Dirk, 2010, s. 160-161Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this demo, we show (a) affordable and relatively easy-to-implement means to facilitate synchronization of audio, video and motion capture data in post processing, and (b) a flexible tool for 3D visualization of recorded motion capture data aligned with audio and video sequences. The synchronisation is made possible by the use of two simple and analogues devices: a turntable and an easy to build electronic clapper board. The demo shows examples of how the signals from the turntable and the clapper board are traced over the three modalities, using the 3D visualisation tool. We also demonstrate how the visualisation tool shows head and torso movements captured by the motion capture system.

  • 168.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hellmer, Kahl
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Spontal: a Swedish spontaneous dialogue corpus of audio, video and motion capture2010Ingår i: Proc. of the Seventh conference on International Language Resources and Evaluation (LREC'10) / [ed] Calzolari, Nicoletta; Choukri, Khalid; Maegaard, Bente; Mariani, Joseph; Odjik, Jan; Piperidis, Stelios; Rosner, Mike; Tapias, Daniel, 2010, s. 2992-2995Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present the Spontal database of spontaneous Swedish dialogues. 120 dialogues of at least 30 minutes each have been captured in high-quality audio, high-resolution video and with a motion capture system. The corpus is currently being processed and annotated, and will be made available for research at the end of the project.

  • 169.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    MushyPeek: an experiment framework for controlled investigation of human-human interaction control behaviour2007Ingår i: Proceedings of Fonetik 2007, 2007, s. 61-64Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper describes MushyPeek, a experiment framework that allows us to manipulate interaction control behaviour – including turn-taking – in a setting quite similar to face-to-face human-human dialogue. The setup connects two subjects to each other over a VoIP telephone connection and simultaneuously provides each of them with an avatar representing the other. The framework is exemplified with the first experiment we tried in it – a test of the effectiveness interaction control gestures in an animated lip-synchronised talking head.

  • 170.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edelstam, Fredrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems2014Ingår i: Proceedings of the of the EACL 2014 Workshop on Dialogue in Motion (DM), Gothenburg, Sweden, 2014, s. 73-77Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents a first, largely qualitative analysis of a set of human-human dialogues recorded specifically to provide insights in how humans handle pauses and resumptions in situations where the speakers cannot see each other, but have to rely on the acoustic signal alone. The work presented is part of a larger effort to find unobtrusive human dialogue behaviours that can be mimicked and implemented in-car spoken dialogue systems within in the EU project Get Home Safe, a collaboration between KTH, DFKI, Nuance, IBM and Daimler aiming to find ways of driver interaction that minimizes safety issues,. The analysis reveals several human temporal, semantic/pragmatic, and structural behaviours that are good candidates for inclusion in spoken dialogue systems.

  • 171.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Ask the experts: Part II: Analysis2010Ingår i: Linguistic Theory and Raw Sound / [ed] Juel Henrichsen, Peter, Frederiksberg: Samfundslitteratur, 2010, s. 183-198Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    We present work fuelled by an urge to understand speech in its original and most fundamental context: in conversation between people. And what better way than to look to the experts? Regarding human conversation, authority lies with the speakers themselves, and asking the experts is a matter of observing and analyzing what speakers do. This is the second part of a two-part discussion which is illustrated with examples mainly from the work at KTH Speech, Music and Hearing. In this part, we discuss methods of extracting useful information from captured data, with a special focus on raw sound.

  • 172.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Cocktail: a demonstration of massively multi-component audio environments for illustration and analysis2010Ingår i: SLTC 2010, The Third Swedish Language Technology Conference (SLTC 2010): Proceedings of the Conference, 2010Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We present MMAE – Massively Multi-component Audio Environments – a new concept in auditory presentation, and Cocktail – a demonstrator built on this technology. MMAE creates a dynamic audio environment by playing a large number of sound clips simultaneously at different locations in a virtual 3D space. The technique utilizes standard soundboards and is based in the Snack Sound Toolkit. The result is an efficient 3D audio environment that can be modified dynamically, in real time. Applications range from the creation of canned as well as online audio environments for games and entertainment to the browsing, analyzing and comparing of large quantities of audio data. We also demonstrate the Cocktail implementation of MMAE using several test cases as examples.

  • 173.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Towards human-like spoken dialogue systems2008Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 50, nr 8-9, s. 630-645Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper presents an overview of methods that can be used to collect and analyse data on user responses to spoken dialogue system components intended to increase human-likeness, and to evaluate how well the components succeed in reaching that goal. Wizard-of-Oz variations, human-human data manipulation, and micro-domains are discussed ill this context, as is the use of third-party reviewers to get a measure of the degree of human-likeness. We also present the two-way mimicry target, a model for measuring how well a human-computer dialogue mimics or replicates some aspect of human-human dialogue, including human flaws and inconsistencies. Although we have added a measure of innovation, none of the techniques is new in its entirely. Taken together and described from a human-likeness perspective, however, they form a set of tools that may widen the path towards human-like spoken dialogue systems.

  • 174.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    /nailon/ - online analysis of prosody2006Ingår i: Working Papers 52: Proceedings of Fonetik 2006, Lund University, Centre for Languages & Literature, Dept. of Linguistics & Phonetics , 2006, s. 37-40Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper presents /nailon/ - a software package for online real-time prosodic analysis that captures a number of prosodic features relevant for interaction control in spoken dialogue systems. The current implementation captures silence durations; voicing, intensity, and pitch; pseudo-syllable durations; and intonation patterns. The paper provides detailed information on how this is achieved.

  • 175.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Underpinning /nailon/ - automatic estimation of pitch range and speaker relative pitch2007Ingår i: Speaker Classification I: Fundamentals, Features, and Methods / [ed] Müller, C., Berlin: Springer , 2007Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    In this study, we explore what is needed to get an automatic estimation of speaker relative pitch that is good enough for many practical tasks in speech technology. We present analyses of fundamental frequency (F0) distributions from eight speakers with a view to examine (i) the effect of semitone transform on the shape of these distributions; (ii) the errors resulting from calculation of percentiles from the means and standard deviations of the distributions; and (iii) the amount of voiced speech required to obtain a robust estimation of speaker relative pitch. In addition, we provide a hands-on description of how such an estimation can be obtained under real-time online conditions using /nailon/ - our software for online analysis of prosody.

  • 176.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gravano, Agustín
    Computer Science Department, University of Buenos Aires.
    Hirschberg, Julia
    Department of Computer Science, Columbia University.
    Very short utterances in conversation2010Ingår i: Proceedings from Fonetik 2010, Lund, June 2-4, 2010 / [ed] Susanne Schötz, Gilbert Ambrazaitis, Lund, Sweden: Lund University , 2010, s. 11-16Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Faced with the difficulties of finding an operationalized definition of backchannels, we have previously proposed an intermediate, auxiliary unit – the very short utterance (VSU) – which is defined operationally and is automatically extractable from recorded or ongoing dialogues. Here, we extend that work in the following ways: (1) we test the extent to which the VSU/NONVSU distinction corresponds to backchannels/non-backchannels in a different data set that is manually annotated for backchannels – the Columbia Games Corpus; (2) we examine to the extent to which VSUS capture other short utterances with a vocabulary similar to backchannels; (3) we propose a VSU method for better managing turn-taking and barge-ins in spoken dialogue systems based on detection of backchannels; and (4) we attempt to detect backchannels with better precision by training a backchannel classifier using durations and inter-speaker relative loudness differences as features. The results show that VSUS indeed capture a large proportion of backchannels – large enough that VSUs can be used to improve spoken dialogue system turntaking; and that building a reliable backchannel classifier working in real time is feasible.

  • 177.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone2012Ingår i: 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 2, 2012, s. 1482-1485Konferensbidrag (Refereegranskat)
    Abstract [en]

    The ability of people, and of machines, to determine the position of a sound source in a room is well studied. The related ability to determine the orientation of a directed sound source, on the other hand, is not, but the few studies there are show people to be surprisingly skilled at it. This has bearing for studies of face-to-face interaction and of embodied spoken dialogue systems, as sound source orientation of a speaker is connected to the head pose of the speaker, which is meaningful in a number of ways. The feature most often implicated for detection of sound source orientation is the inter-aural level difference - a feature which it is assumed is more easily exploited in anechoic chambers than in everyday surroundings. We expand here on our previous studies and compare detection of speaker orientation within and outside of the anechoic chamber. Our results show that listeners find the task easier, rather than harder, in everyday surroundings, which suggests that inter-aural level differences is not the only feature at play.

  • 178.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    Voice Technologies, Expert Functions, Teliasonera.
    Two faces of spoken dialogue systems2006Ingår i: Interspeech 2006 - ICSLP Satellite Workshop Dialogue on Dialogues: Multidisciplinary Evaluation of Advanced Speech-based Interactive Systems, Pittsburgh PA, USA, 2006Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper is intended as a basis for discussion. We propose that users may, knowingly or subconsciously, interpret the events that occur when interacting with spoken dialogue systems in more than one way. Put differently, there is more than one metaphor people may use in order to make sense of spoken human-computer dialogue. We further suggest that different metaphors may not play well together. The analysis is consistent with many observations in human-computer interaction and has implications that may be helpful to researchers and developers alike. For example, developers may want to guide users towards a metaphor of their choice and ensure that the interaction is coherent with that metaphor; researchers may need different approaches depending on the metaphor employed in the system they study; and in both cases one would need to have very good reasons to use mixed metaphors.

  • 179.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Gustafson, Joakim
    Voice Technologies, Expert Functions, Teliasonera, Haninge, Sweden.
    Utterance segmentation and turn-taking in spoken dialogue systems2005Ingår i: Computer Studies in Language and Speech / [ed] Fisseni, B.; Schmitz, H-C.; Schröder, B.; Wagner, P., Frankfurt am Main, Germany: Peter Lang , 2005, s. 576-587Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    A widely used method for finding places to take turn in spoken dialogue systems is to assume that an utterance ends where the user ceases to speak. Such endpoint detection normally triggers on a certain amount of silence, or non-speech. However, spontaneous speech frequently contains silent pauses inside sentencelike units, for example when the speaker hesitates. This paper presents /nailon/, an on-line, real-time prosodic analysis tool, and a number of experiments in which end-point detection has been augmented with prosodic analysis in order to segment the speech signal into what humans intuitively perceive as utterance-like units.

  • 180.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Who am I speaking at?: perceiving the head orientation of speakers from acoustic cues alone2012Ingår i: Proc. of LREC Workshop on Multimodal Corpora 2012, Istanbul, Turkey, 2012Konferensbidrag (Refereegranskat)
    Abstract [en]

    The ability of people, and of machines, to determine the position of a sound source in a room is well studied. The related ability to determine the orientation of a directed sound source, on the other hand, is not, but the few studies there are show people to be surprisingly skilled at it. This has bearing for studies of face-to-face interaction and of embodied spoken dialogue systems, as sound source orientation of a speaker is connected to the head pose of the speaker, which is meaningful in a number of ways. We describe in passing some preliminary findings that led us onto this line of investigation, and in detail a study in which we extend an experiment design intended to measure perception of gaze direction to test instead for perception of sound source orientation. The results corroborate those of previous studies, and further show that people are very good at performing this skill outside of studio conditions as well.

  • 181.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    3rd party observer gaze during backchannels2012Ingår i: Proc. of the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog, Skamania Lodge, WA, USA, 2012Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes a study of how the gazes of 3rd party observers of dialogue move when a speaker is taking the turn and producing a back-channel, respectively. The data is collected and basic processing is complete, but the results section for the paper is not yet in place. It will be in time for the workshop, however, and will be presented there, should this paper outline be accepted..

  • 182.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Pelcé, Antoine
    Prosodic features of very short utterances in dialogue2009Ingår i: Nordic Prosody: Proceedings of the Xth Conference / [ed] Vainio, Martti; Aulanko, Reijo; Aaltonen, Olli, Frankfurt am Main: Peter Lang , 2009, s. 57-68Konferensbidrag (Refereegranskat)
  • 183.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Applications of distributed dialogue systems: the KTH Connector2005Ingår i: Proceedings of ISCA Tutorial and Research Workshop on Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005), 2005Konferensbidrag (Refereegranskat)
    Abstract [en]

    We describe a spoken dialogue system domain: that of the personal secretary. This domain allows us to capitalise on the characteristics that make speech a unique interface; characteristics that humans use regularly, implicitly, and with remarkable ease. We present a prototype system - the KTH Connector - and highlight several dialogue research issues arising in the domain.

  • 184.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Is it really worth it?: Cost-based selection of system responses to speech-in-overlap2012Ingår i: Proc. of the IVA 2012 workshop on Realtime Conversational Virtual Agents (RCVA 2012), Santa Crux, CA, USA, 2012Konferensbidrag (Refereegranskat)
    Abstract [en]

    For purposes of discussion and feedback, we present a preliminary version of a simple yet powerful cost-based framework for spoken dialogue sys-tems to continuously and incrementally decide whether to speak or not. The framework weighs the cost of producing speech in overlap against the cost of not speaking when something needs saying. Main features include a small number of parameters controlling characteristics that are readily understood, al-lowing manual tweaking as well as interpretation of trained parameter settings; observation-based estimates of expected overlap which can be adapted dynami-cally; and a simple and general method for context dependency. No evaluation has yet been undertaken, but the effects of the parameters; the observation-based cost of expected overlap trained on Switchboard data; and the context de-pendency using inter-speaker intensity differences from the same corpus are demonstrated with generated input data in the context of user barge-ins.

  • 185.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Tånnander, Christina
    The Swedish Library of Talking Books and Braille.
    Unconventional methods in perception experiments2012Ingår i: Proc. of Nordic Prosody XI, Tartu, Estonia, 2012Konferensbidrag (Övrigt vetenskapligt)
  • 186.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gesture movement profiles in dialogues from a Swedish multimodal database of spontaneous speech2012Ingår i: Prosodic and Visual Resources in Interactional Grammar / [ed] Bergmann, Pia; Brenning, Jana; Pfeiffer, Martin C.; Reber, Elisabeth, Walter de Gruyter, 2012Kapitel i bok, del av antologi (Refereegranskat)
  • 187.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Prosodic Features in the Perception of Clarification Ellipses2005Ingår i: Proceedings of Fonetik 2005: The XVIIIth Swedish Phonetics Conference, Gothenburg, Sweden, 2005, s. 107-110Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We present an experiment where subjects were asked to listen to Swedish human-computer dialogue fragments where a synthetic voice makes an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and subjects were asked to judge the computer's actual intention. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said. The study can be seen as the beginnings of a tentative model for intonation of clarification ellipses in Swedish, which can be implemented and tested in spoken dialogue systems.

  • 188.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    The effects of prosodic features on the interpretation of clarification ellipses2005Ingår i: Proceedings of Interspeech 2005: Eurospeech, 2005, s. 2389-2392Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, the effects of prosodic features on the interpretation of elliptical clarification requests in dialogue are studied. An experiment is presented where subjects were asked to listen to short human-computer dialogue fragments in Swedish, where a synthetic voice was making an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and the subjects were asked to judge what was actually intended by the computer. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said. The study can be seen as the beginnings of a tentative model for intonation of clarification ellipses in Swedish, which can be implemented and tested in spoken dialogue systems.

  • 189.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Question types and some prosodic correlates in 600 questions in the Spontal database of Swedish dialogues2012Ingår i: Proceedings Of The 6th International Conference On Speech Prosody, Vols I and  II, Shanghai, China: Tongji Univ Press , 2012, s. 737-740Konferensbidrag (Refereegranskat)
    Abstract [en]

    Studies of questions present strong evidence that there is no one-to-one relationship between intonation and interrogative mode. We present initial steps of a larger project investigating and describing intonational variation in the Spontal database of 120 half-hour spontaneous dialogues in Swedish, and testing the hypothesis that the concept of a standard question intonation such as a final pitch rise contrasting a final low declarative intonation is not consistent with the pragmatic use of intonation in dialogue. We report on the extraction of 600 questions from the Spontal corpus, coding and annotation of question typology, and preliminary results concerning some prosodic correlates related to question type.

  • 190.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Oertel, Catharine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Investigating negotiation for load-time in the GetHomeSafe project2012Ingår i: Proc. of Workshop on Innovation and Applications in Speech Technology (IAST), Dublin, Ireland, 2012, s. 45-48Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes ongoing work by KTH Speech, Music and Hearing in GetHomeSafe, a newly inaugurated EU project in collaboration with DFKI, Nuance, IBM and Daimler. Under the assumption that drivers will utilize technology while driving regardless of legislation, the project aims at finding out how to make the use of in-car technology as safe as possible rather than prohibiting it. We describe the project in general briefly and our role in some more detail, in particular one of our tasks: to build a system that can ask the driver if now is a good time to speak about X? in an unobtrusive manner, and that knows how to deal with rejection, for example by asking the driver to get back when it is a good time or to schedule a time that will be convenient.

  • 191.
    Edlund, Jens
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Skantze, Gabriel
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Carlson, Rolf
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Higgins: a spoken dialogue system for investigating error handling techniques2004Ingår i: Proceedings of the International Conference on Spoken Language Processing, ICSLP 04, 2004, s. 229-231Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, an overview of the Higgins project and the research within the project is presented. The project incorporates studies of error handling for spoken dialogue systems on several levels, from processing to dialogue level. A domain in which a range of different error types can be studied has been chosen: pedestrian navigation and guiding. Several data collections within Higgins have been analysed along with data from Higgins' predecessor, the AdApt system. The error handling research issues in the project are presented in light of these analyses.

  • 192.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Telling questions from statements in spoken dialogue systems2012Ingår i: Proc. of SLTC 2012, Lund, Sweden, 2012Konferensbidrag (Refereegranskat)
  • 193.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Tånnander, Christina
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Audience response system-based assessment for analysis-by-synthesis2015Ingår i: Proc. of ICPhS 2015, ICPhS , 2015Konferensbidrag (Refereegranskat)
  • 194. Eklund, R.
    et al.
    Peters, G.
    Ananthakrishnan, Gopal
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Mabiza, E.
    An acoustic analysis of lion roars. I: Data collection and spectrogram and waveform analyses2011Ingår i: TMH-QPSR, ISSN 1104-5787, Vol. 51, nr 1, s. 1-4Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    This paper describes the collection of lion roar data at two different locations, an outdoor setting at Antelope Park in Zimbabwe and an indoor setting at Parken Zoo in Sweden. Preliminary analyses of spectrographic and waveform data are provided.

  • 195.
    Elblaus, Ludvig
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Hansen, Kjetil Falkenberg
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Unander-Scharin, Carl
    university College of Opera, Sweden.
    Artistically directed prototyping in development and in practice2012Ingår i: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, Vol. 41, nr 4, s. 377-387Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The use of technology in artistic contexts presents interestingchallenges regarding the processes in which engineers, artists andperformers work together. The artistic intent and goals of the participantsare relevant both when shaping the development practice, and in definingand refining the role of technology in practice. In this paper wepresent strategies for structuring the development process, basedon iterative design and participatory design. The concepts are describedin theory and examples are given of how they have been successfullyapplied. The cases make heavy use of different types of prototypingand this practice is also discussed. The development cases all relateto a single artifact, a gestural voice processing instrument calledThe Throat. This artifact has been in use since it was developed,and from that experience, three cases are presented. The focus ofthese cases is on how artistic vision through practice can recontextualizetechnology, and, without rebuilding it, redefine it and give it anew role to play.

  • 196.
    Elblaus, Ludvig
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Hansen, Kjetil Falkenberg
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Unander-Scharin, Carl
    University College of Opera, France .
    Exploring the design space: Prototyping "The Throat V3"for the elephant man opera2011Ingår i: Proceedings of the 8th Sound and Music Computing Conference, SMC 2011, Padova, Italy: Padova University Press , 2011, s. 141-147Konferensbidrag (Refereegranskat)
    Abstract [en]

    Developing new technology for artistic practice requires other methods than classical problem solving. Some of the challenges involved in the development of new musical instruments have affinities to the realm of wicked problems. Wicked problems are hard to define and have many different solutions that are good or bad (not true or false). The body of possible solutions to a wicked problem can be called a design space and exploring that space must be the objective of a design process.In this paper we present effective methods of iterative design and participatory design that we have used in a project developed in collaboration between the Royal Institute of Technology (KTH) and the University College of Opera, both in Stockholm. The methods are outlined, and examples are given of how they have been applied in specific situations.The focus lies on prototyping and evaluation with user participation. By creating and acting out scenarios with the user, and thus asking the questions through a prototype and receiving the answers through practice and exploration, we removed the bottleneck represented by language and allowed communication beyond verbalizing. Doing this, even so-called tacit knowledge could be activated and brought into the development process.

  • 197.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Accounting for Individual Speaker Properties in Automatic Speech Recognition2010Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    In this work, speaker characteristic modeling has been applied in the fields of automatic speech recognition (ASR) and automatic speaker verification (ASV). In ASR, a key problem is that acoustic mismatch between training and test conditions degrade classification per- formance. In this work, a child exemplifies a speaker not represented in training data and methods to reduce the spectral mismatch are devised and evaluated. To reduce the acoustic mismatch, predictive modeling based on spectral speech transformation is applied. Follow- ing this approach, a model suitable for a target speaker, not well represented in the training data, is estimated and synthesized by applying vocal tract predictive modeling (VTPM). In this thesis, the traditional static modeling on the utterance level is extended to dynamic modeling. This is accomplished by operating also on sub-utterance units, such as phonemes, phone-realizations, sub-phone realizations and sound frames.

    Initial experiments shows that adaptation of an acoustic model trained on adult speech significantly reduced the word error rate of ASR for children, but not to the level of a model trained on children’s speech. Multi-speaker-group training provided an acoustic model that performed recognition for both adults and children within the same model at almost the same accuracy as speaker-group dedicated models, with no added model complexity. In the analysis of the cause of errors, body height of the child was shown to be correlated to word error rate.

    A further result is that the computationally demanding iterative recognition process in standard VTLN can be replaced by synthetically extending the vocal tract length distribution in the training data. A multi-warp model is trained on the extended data and recognition is performed in a single pass. The accuracy is similar to that of the standard technique.

    A concluding experiment in ASR shows that the word error rate can be reduced by ex- tending a static vocal tract length compensation parameter into a temporal parameter track. A key component to reach this improvement was provided by a novel joint two-level opti- mization process. In the process, the track was determined as a composition of a static and a dynamic component, which were simultaneously optimized on the utterance and sub- utterance level respectively. This had the principal advantage of limiting the modulation am- plitude of the track to what is realistic for an individual speaker. The recognition error rate was reduced by 10% relative compared with that of a standard utterance-specific estimation technique.

    The techniques devised and evaluated can also be applied to other speaker characteristic properties, which exhibit a dynamic nature.

    An excursion into ASV led to the proposal of a statistical speaker population model. The model represents an alternative approach for determining the reject/accept threshold in an ASV system instead of the commonly used direct estimation on a set of client and impos- tor utterances. This is especially valuable in applications where a low false reject or false ac- cept rate is required. In these cases, the number of errors is often too few to estimate a reli- able threshold using the direct method. The results are encouraging but need to be verified on a larger database.

  • 198.
    Elenius, Daniel
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Adaptation and Normalization Experiments in Speech Recognition for 4 to 8 Year old Children.2005Konferensbidrag (Refereegranskat)
    Abstract [en]

    An experimental offline investigation of the performance of connected digits recognition was performed on children in the age range four to eight years. Poor performance using adult models was improved significantly by adaptation and vocal tract length normalisation but not to the same level as training on children. Age dependent models were tried with limited advantage. A combined adult and child raining corpus maintained the performance for the separately trained categories. Linear frequency compression for vocal tract length nor-malization was attempted but estimation of the warping factor was sensitive to non-speech segments and background noise. Phoneme-based word modeling outperformed the whole word models, even though the vocabulary only consisted of digits.

  • 199.
    Elenius, Daniel
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Characteristics of a Low Reject Mode Speaker Verification System2002Konferensbidrag (Refereegranskat)
    Abstract [en]

    The performance of a speaker verification (SV) system is normally determined by the false reject (FRR) and false accept (FAR) rates as averages on a population of test speakers. However, information on the FRR distribution is required when estimating the portion of clients that will suffer from an unacceptably high reject rate. This paper studies this distribu- tion in a population using a SV system operating in low reject mode. Two models of the distribution are proposed and compared with test data. An attempt is also made to tune the decision threshold in order to obtain a desired portion of clients having a reject rate lower than a specified value.

  • 200.
    Elenius, Daniel
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Comparing speech recognition for adults and children2004Ingår i: Proceedings of Fonetik 2004: The XVIIth Swedishi Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2004, s. 156-159Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper presents initial studies of the performanceof a speech recogniser on children's speech when trained on children or adults. Aconnected-digits recogniser was used for this purpose. The individual digit accuracy among the children is correlated to some features ofthe child, such as age, gender, fundamental frequency and height. A strong correlation between age and accuracy was found. The accuracy was also found to be lower for child recognition than for adult recognition, even though the recognisers were trained on the correct class of speakers.

1234567 151 - 200 av 692
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf