Ändra sökning
Avgränsa sökresultatet
1234567 101 - 150 av 692
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 101.
    Björkvall, Dennis
    et al.
    KTH, Skolan för teknik och hälsa (STH), Medicinsk teknik, Data- och elektroteknik.
    Ploug, Martin
    KTH, Skolan för teknik och hälsa (STH), Medicinsk teknik, Data- och elektroteknik.
    Metod för automatiserad sammanfattning och nyckelordsgenerering2016Självständigt arbete på grundnivå (yrkesexamen), 10 poäng / 15 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Företaget Widespace hanterar hundratals ärenden i veckan vilket kräver stor överblick för varje an-ställd att sätta sig in i varje enskilt ärende. På grund av denna kvantitet blir uppgiften att skapa över-blicken ett stort problem. För att lösa detta problem krävs en mer konsekvent användning av meta-data och därför har en litteraturstudie om metadata, automatiserad sammanfattning och nyckelords-generering utförts.

    Arbetet gick ut på att utveckla en prototyp som automatisk kan generera en sammanfattning av texten från ett ärende, samt generera en lista av nyckelord och ge en indikation om vilket språk texten är skriven i. Det ingick också i arbetet att göra en undersökning av tidigare arbeten för att se vilka system och metoder som kan användas för att lösa denna uppgift. Två egenutvecklade prototyper, MkOne och MkTwo, jämfördes med varandra och utvärderades därefter. Metoderna som använts bygger på både statistiska och lingvistiska processer. En analys av resultaten gjordes och visade att prototypen MkOne levererade bäst resultat för sammanfattningen och att nyckelordlistan tillhandahöll nyckelord av hög precision och en bred täckning.

  • 102.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Model space size scaling for speaker adaptation2011Ingår i: Proceedings of Fonetik 2011, Stockholm: KTH Royal Institute of Technology, 2011, Vol. 51, nr 1, s. 77-80Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In the current work, instantaneous adaptation in speech recognition is performedby estimating speaker properties, which modify the original trained acousticmodels. We introduce a new property, the size of the model space, which isincluded to the previously used features, VTLN and spectral slope. These arejointly estimated for each test utterance. The new feature has shown to be effectivefor recognition of children’s speech using adult-trained models in TIDIGITS.Adding the feature lowered the error rate by around 10% relative. The overallcombination of VTLN, spectral slope and model space scaling represents asubstantial 31% relative reduction compared with single VTLN. There was noimprovement among adult speakers in TIDIGITS and in TIMIT. Improvement forthis speaker category is expected when the training and test sets are recorded indifferent conditions, such as read and spontaneous speech.

  • 103.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Estimating speaker characteristics for speech recognition2009Ingår i: Proceedings of Fonetik 2009 / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 154-158Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    A speaker-characteristic-based hierarchic tree of speech recognition models is designed. The leaves of the tree contain model sets, which are created by transforming a conventionally trained set using leaf-specific speaker profile vectors. The non-leaf models are formed by merging the models of their child nodes. During recognition, a maximum likelihood criterion is followed to traverse the tree from the root to a leaf. The computational load for estimating one- (vocal tract length) and fourdimensional speaker profile vectors (vocal tractlength, two spectral slope parameters andmodel variance scaling) is reduced to a fraction compared to that of an exhaustive search among all leaf nodes. Recognition experiments on children’s connected digits using adult models exhibit similar recognition performance for the exhaustive and the one-dimensional tree search. Further error reduction is achieved with the four-dimensional tree. The estimated speaker properties are analyzed and discussed.

  • 104.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Investigating Explicit Model Transformations for Speaker Normalization2008Ingår i: Proceedings of ISCA ITRW Speech Analysis and Processing for Knowledge Discovery / [ed] Paul Dalsgaard, Christian Fischer Pedersen, Ove Andersen, Aalborg, Denmark: ISCA/AAU , 2008Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this work we extend the test utterance adaptation techniqueused in vocal tract length normalization to a larger number ofspeaker characteristic features. We perform partially joint estimation of four features: the VTLN warping factor, the corner position of the piece-wise linear warping function, spectral tilt in voiced segments, and model variance scaling. In experiments on the Swedish PF-Star children database, joint estimation of warping factor and variance scaling lowered the recognition error rate compared to warping factor alone.

  • 105.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Knowledge-Rich Model Transformations for SpeakerKnowledge-Rich Model Transformations for Speaker Normalization in Speech Recognition2008Ingår i: Proceedings, FONETIK 2008, Department of Linguistics, University of Gothenburg / [ed] Anders Eriksson, Jonas Lindh, 2008, s. 37-40Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In this work we extend the test utterance adaptationtechnique used in vocal tract length normalizationto a larger number of speaker characteristicfeatures. We perform partially jointestimation of four features: the VTLN warpingfactor, the corner position of the piece-wise linearwarping function, spectral tilt in voicedsegments, and model variance scaling. In experimentson the Swedish PF-Star children database,joint estimation of warping factor andvariance scaling lowers the recognition errorrate compared to warping factor alone.

  • 106.
    Blomberg, Mats
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Elenius, Daniel
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Zetterholm, Elisabeth
    Department of Philosophy & Linguistics, Umeå University.
    Speaker verification scores and acoustic analysis of a professional impersonator2004Ingår i: Proceedings of Fonetik 2004: The XVIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University , 2004, s. 84-87Konferensbidrag (Övrigt vetenskapligt)
  • 107.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Children and adults in dialogue with the robot head Furhat - corpus collection and initial analysis2012Ingår i: Proceedings of WOCCI, Portland, OR, 2012Konferensbidrag (Refereegranskat)
  • 108.
    Bollepalli, Bajibabu
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. Aalto University, Department of Signal Processing and Acoustics.
    Towards conversational speech synthesis: Experiments with data quality, prosody modification, and non-verbal signals2017Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    The aim of a text-to-speech synthesis (TTS) system is to generate a human-like speech waveform from a given input text. Current TTS sys- tems have already reached a high degree of intelligibility, and they can be readily used to read aloud a given text. For many applications, e.g. public address systems, reading style is enough to convey the message to the people. However, more recent applications, such as human-machine interaction and speech-to-speech translation, call for TTS systems to be increasingly human- like in their conversational style. The goal of this thesis is to address a few issues involved in a conversational speech synthesis system.

    First, we discuss issues involve in data collection for conversational speech synthesis. It is very important to have data with good quality as well as con- tain more conversational characteristics. In this direction we studied two methods 1) harvesting the world wide web (WWW) for the conversational speech corpora, and 2) imitation of natural conversations by professional ac- tors. In former method, we studied the effect of compression on the per- formance of TTS systems. It is often the case that speech data available on the WWW is in compression form, mostly use the standard compression techniques such as MPEG. Thus in paper 1 and 2, we systematically stud- ied the effect of MPEG compression on TTS systems. Results showed that the synthesis quality indeed affect by the compression, however, the percep- tual differences are strongly significant if the compression rate is less than 32kbit/s. Even if one is able to collect the natural conversational speech it is not always suitable to train a TTS system due to problems involved in its production. Thus in later method, we asked the question that can we imi- tate the conversational speech by professional actors in recording studios. In this direction we studied the speech characteristics of acted and read speech. Second, we asked a question that can we borrow a technique from voice con- version field to convert the read speech into conversational speech. In paper 3, we proposed a method to transform the pitch contours using artificial neu- ral networks. Results indicated that neural networks are able to transform pitch values better than traditional linear approach. Finally, we presented a study on laughter synthesis, since non-verbal sounds particularly laughter plays a prominent role in human communications. In paper 4 we present an experimental comparison of state-of-the-art vocoders for the application of HMM-based laughter synthesis. 

  • 109.
    Bollepalli, Bajibabu
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    HMM based speech synthesis system for Swedish Language2012Ingår i: The Fourth Swedish Language Technology Conference, Lund, Sweden, 2012Konferensbidrag (Refereegranskat)
  • 110.
    Bollepalli, Bajibabu
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks2013Ingår i: Advances in nonlinear speech processing: 6th International Conference, NOLISP 2013, Mons, Belgium, June 19-21, 2013 : proceedings, Springer Berlin/Heidelberg, 2013, s. 97-103Konferensbidrag (Refereegranskat)
    Abstract [en]

    Majority of the current voice conversion methods do not focus on the modelling local variations of pitch contour, but only on linear modification of the pitch values, based on means and standard deviations. However, a significant amount of speaker related information is also present in pitch contour. In this paper we propose a non-linear pitch modification method for mapping the pitch contours of the source speaker according to the target speaker pitch contours. This work is done within the framework of Artificial Neural Networks (ANNs) based voice conversion. The pitch contours are represented with Discrete Cosine Transform (DCT) coefficients at the segmental level. The results evaluated using subjective and objective measures confirm that the proposed method performed better in mimicking the target speaker's speaking style when compared to the linear modification method.

  • 111.
    Bollepalli, Bajibabu
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Raitio, T.
    Alku, P.
    Effect of MPEG audio compression on HMM-based speech synthesis2013Ingår i: Proceedings of the 14th Annual Conference of the International Speech Communication Association: Interspeech 2013. International Speech Communication Association (ISCA), 2013, 2013, s. 1062-1066Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, the effect of MPEG audio compression on HMMbased speech synthesis is studied. Speech signals are encoded with various compression rates and analyzed using the GlottHMM vocoder. Objective evaluation results show that the vocoder parameters start to degrade from encoding with bitrates of 32 kbit/s or less, which is also confirmed by the subjective evaluation of the vocoder analysis-synthesis quality. Experiments with HMM-based speech synthesis show that the subjective quality of a synthetic voice trained with 32 kbit/s speech is comparable to a voice trained with uncompressed speech, but lower bit rates induce clear degradation in quality.

  • 112. Bolíbar, Jordi
    et al.
    Bresin, Roberto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sound feedback for the optimization of performance in running2012Ingår i: TMH-QPSR special issue: Proceedings of SMC Sweden 2012 Sound and Music Computing, Understanding and Practicing in Sweden, ISSN 1104-5787, Vol. 52, nr 1, s. 39-40Artikel i tidskrift (Refereegranskat)
  • 113. Borin, Lars
    et al.
    Brandt, Martha D.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Lindh, Jonas
    Parkvall, Mikael
    The Swedish Language in the Digital Age/Svenska språket i den digitala tidsåldern2012Bok (Refereegranskat)
  • 114. Boves, L.
    et al.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hinrichs, E.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Krauwer, S.
    Lemnitzer, L.
    Vainio, M.
    Wittenburg, P.
    Resources for Speech Research: Present and Future Infrastructure Needs2009Ingår i: Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, UK, 2009, s. 1803-1806Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper introduces the EU-FP7 project CLARIN, a joint effort of over 150 institutions in Europe, aimed at the creation of a sustainable language resources and technology infrastructure for the humanities and social sciences research community. The paper briefly introduces the vision behind the project and how it relates to speech research with a focus on the contributions that CLARIN can and will make to research in spoken language processing.

  • 115.
    Boye, Johan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Fredriksson, M.
    Götze, Jana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Königsmann, J.
    Walk this way: Spatial grounding for city exploration2012Ingår i: IWSDS, 2012Konferensbidrag (Refereegranskat)
  • 116.
    Boye, Johan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Fredriksson, Morgan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Götze, Jana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Königsmann, Jurgen
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Walk this way: Spatial grounding for city exploration2014Ingår i: Natural interaction with robots, knowbots and smartphones, Springer-Verlag , 2014, s. 59-67Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    Recently there has been an interest in spatially aware systems for pedestrian routing and city exploration, due to the proliferation of smartphones with GPS receivers among the general public. Since GPS readings are noisy, giving good and well-timed route instructions to pedestrians is a challenging problem. This paper describes a spoken-dialogue prototype for pedestrian navigation in Stockholm that addresses this problem by using various grounding strategies.

  • 117.
    Boye, Johan
    et al.
    TeliaSonera.
    Gustafson, Joakim
    TeliaSonera.
    How to do dialogue in a fairy-tale world2005Ingår i: Proceedings of the 6th SIGDial workshop on discourse and dialogue, 2005Konferensbidrag (Refereegranskat)
    Abstract [en]

    The work presented in this paper is an endeavor tocreate a prototype of a computer game with spokendialogue capabilities. Advanced spoken dialogue hasthe potential to considerably enrich computer games,where it for example would allow players to refer topast events and to objects currently not visible onthe screen. It would also allaow users to interactsocially and to negotiate solutions with the gamecharacters. The game takes place in a fairy-taleworld, and features two different fairy-talecharacters, who can interact with the player and witheach other using spoken dialogue. The fairy-talecharacters are separate entities in the sense that eachcharacter has its own set of goals and its ownperception of the world. This paper gives anoverview of the functionality of the implementeddialogue manager in the NICE fairy-tale gamesystem.

  • 118.
    Boye, Johan
    et al.
    TeliaSonera.
    Wirén, Mats
    TeliaSonera.
    Gustafson, Joakim
    TeliaSonera.
    Contextual reasoning in multimodal dialogue systems: two case studies2004Ingår i: Proceedings of The 8th Workshop on the Semantics and Pragmatics of Dialogue Catalogue'04, Barcelona, 2004, s. 19-21Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes an approach to contextual reasoning for interpretation ofspoken multimodal dialogue. The approach is based on combining recencybased search for antecedents with an object-oriented domain representation insuch a way that the search is highly constrained by the type information of theantecedents. By furthermore representingcandidate antecedents from the dialoguehistory and visual context in a uniformway, a single machinery (based on -reduction in lambda calculus) can be usedfor resolving many kinds of underspecified utterances. The approach has beenimplemented in two highly different domains.

  • 119.
    Bruce, Gösta
    et al.
    Lund University.
    Schötz, Susanne
    Lund University.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    SIMULEKT: modelling Swedish regional intonation2007Ingår i: Proceedings of Fonetik 2007, Stockholm: KTH Royal Institute of Technology, 2007, Vol. 50, nr 1, s. 121-124Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper introduces a new research project Simulating Intonational Varieties of Swedish (SIMULEKT). The basic goal of the project is to produce more precise and thorough knowledge about some major intonational varieties of Swedish. In this research effort the Swedish prosody model plays a prominent role. A fundamental idea is to take advantage of speech synthesis in different forms. In our analysis and synthesis work we will focus on some major intonational types: South, Göta, Svea, Gotland, Dala, North, and Finland Swedish. The significance of our project work will be within basic research as well as in speech technology applications.

  • 120.
    Brunsberg, Sandra
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    Shaw, Philip
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    The mathematical English of Swedish undergraduates: assimilation and adaptation2005Ingår i: Språk på tvärs: Rapport från ASLA:s höstsymposium Södertörn, 11–12 november 2004 / [ed] Boel De Geer, Anna Malmbjer, Uppsala: Svenska föreningen för tillämpad språkvetenskap, ASLA , 2005, s. 119-130.Konferensbidrag (Refereegranskat)
  • 121. Brusk, J.
    et al.
    Lager, T.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    DEAL – Dialogue Management in SCXML for Believable Game Characters2007Ingår i: Proceedings of the 2007 Conference on Future Play, Future Play '07, 2007, s. 137-144Konferensbidrag (Refereegranskat)
    Abstract [en]

    In order for game characters to be believable, they must appear to possess qualities such as emotions, the ability to learn and adapt as well as being able to communicate in natural language. With this paper we aim to contribute to the development of believable non-player characters (NPCs) in games, by presenting a method for managing NPC dialogues. We have selected the trade scenario as an example setting since it offers a well-known and limited domain common in games that support ownership, such as role-playing games. We have developed a dialogue manager in State Chart XML, a newly introduced W3C standard, as part of DEAL -- a research platform for exploring the challenges and potential benefits of combining elements from computer games, dialogue systems and language learning.

  • 122.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Conflicting acoustic cues in stop perception2007Ingår i: Where Do Features Come From ?: Phonological Primitives in the Brain, the Mouth, and the Ear, 2007, s. 63-64Konferensbidrag (Refereegranskat)
  • 123.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Using acoustic cues in stop perception2007Ingår i: Proceedings of Fonetik 2007, 2007, Vol. 50, nr 1, s. 25-28Konferensbidrag (Övrigt vetenskapligt)
  • 124.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Towards human-like behaviour in spoken dialog systems2006Ingår i: Proceedings of Swedish Language Technology Conference (SLTC 2006), Gothenburg, Sweden, 2006Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We and others have found it fruitful to assume that users, when interacting with spoken dialogue systems, perceive the systems and their actions metaphorically. Common metaphors include the human metaphor and the interface metaphor (cf. Edlund, Heldner, & Gustafson, 2006). In the interface metaphor, the spoken dialogue system is perceived as a machine interface – often but not always a computer interface. Speech is used to accomplish what would have otherwise been accomplished by some other means of input, such as a keyboard or a mouse. In the human metaphor, on the other hand, the computer is perceived as a creature (or even a person) with humanlike conversational abilities, and speech is not a substitute or one of many alternatives, but rather the primary means of communicating with this creature. We are aware that more “natural ” or human-like behaviour does not automatically make a spoken dialogue system “better ” (i.e. more efficient or more well-liked by its users). Indeed, we are quite convinced that the advantage (or disadvantage) of humanlike behaviour will be highly dependent on the application. However, a dialogue system that is coherent with a human metaphor may profit from a number of characteristics.

  • 125.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Swerts, Marc
    Tilburg University, The Netherlands.
    Perceptual judgments of pitch range2004Ingår i: Proc. of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, s. 689-692Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper reports on a study that explores to what extent listeners are able to judge where a particular utterance fragment is located in a speaker's pitch range. The research consists of a perception study that makes use of 100 stimuli, selected from 50 different speakers whose speech was originally collected for a multi-speaker database of Swedish speech materials. The fragments are presented to subjects whom are asked to estimate whether the fragment is located in the lower or higher part of that speaker's range. Results reveal that listeners' judgments are dependent on the gender of the speaker, but that within a gender they tend to hear differences in range.

  • 126.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Rule-based Speech Synthesis2008Ingår i: Springer Handbook of Speech Processing / [ed] Benesty, J.; Sondhi, M. M.; Huang, Y., Berlin/Heidelberg: Springer Berlin/Heidelberg, 2008, s. 429-436Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    In this chapter, we review some of the issues in rule-based synthesis and specifically discuss formant synthesis. Formant synthesis and the theory behind have played an important role in both the scientific progress in understanding how humans talk and also the development of the first speech technology applications. Its flexibility and small footprint makes the approach still of interest and a valuable complement to the current dominant methods based on concatenative data-driven synthesis. As already mentioned in the overview by Schroeter (Chap. 19) we also see a new trend to combine the rule-based and data-driven approaches. Formant features from a database that can be used both to optimize a rule-based formant synthesis system and to optimize the search for good units in a concatenative system.

  • 127.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Speech Synthesis2010Ingår i: The Handbook of Phonetic Sciences, Blackwell Publishing, 2010, 2, s. 781-803Kapitel i bok, del av antologi (Refereegranskat)
  • 128.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Exploring Data Driven Parametric Synthesis2009Ingår i: Proceedings of Fonetik 2009: The XXIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm, Sweden: Stockholm University, 2009, s. 86-91Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper describes our work on building aformant synthesis system based on both rule generated and database driven methods. Three parametric synthesis systems are discussed: our traditional rule based system, a speaker adapted system, and finally a gesture system.The gesture system is a further development of the adapted system in that it includes concatenated formant gestures from a data-driven unit library. The systems are evaluated technically, comparing the formant tracks with an analysed test corpus. The gesture system results in a 25% error reduction in the formant frequencies due to the inclusion of the stored gestures. Finally, a perceptual evaluation shows a clear advantage in naturalness for the gesture system compared to both the traditional system and the speaker adapted system.

  • 129.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strangert, E.
    Synthesising disfluencies in a dialogue system2009Ingår i: Nordic Prosody: Proccedings of the Xth Conference / [ed] Vainio, M., Aulanko, R., Aaltonen, O., Frankfurt am Main: Peter Lang Publishing Group, 2009Konferensbidrag (Refereegranskat)
  • 130.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strangert, Eva
    Umeå University.
    Prosodic Cues for Hesitation2006Ingår i: Working Papers 52: Proceedings from Fonetik 2006 / [ed] Gilbert Ambrazaitis, Susanne Schötz, Lund: Lund University , 2006, Vol. 52, s. 21-24Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In our efforts to model spontaneous speech for use in, for example, spoken dialogue systems, a series of experiments have been conducted in order to investigate correlates to perceived hesitation. Previous work has shown that it is the total duration increase that is the valid cuerather than the contribution by either of the two factors pause duration and final lengthening. In the present experiment we explored the effects of F0 slope variation and the presence vs. absence of creaky voice in addition to durational cues, using synthetic stimuli. The results showed that variation of both F0 slope and creaky voice did have perceptual effects, but to amuch lesser degree than the durational increase.

  • 131.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafsson, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strangert, Eva
    Umeå University.
    Modelling hesitation for synthesis of spontaneous speech2006Ingår i: Proceedings of Speech Prosody 2006 / [ed] R. Hoffmann, H. Mixdorff, Dresden, 2006Konferensbidrag (Refereegranskat)
    Abstract [en]

    The current work deals with the modelling of one type of disfluency, hesitations. A perceptual experiment using speech synthesis was designed to evaluate two duration features found to be correlates to hesitation, pause duration and final lengthening. A variation of F0 slope before the hesitation wasalso included. The most important finding is that it is the totalduration increase that is the valid cue rather than the contribution by either factor. In addition, our findings lead us to assume an interaction with syntax. The absence of strong effects of the induced F0 variation was unexpected and we consider several possible explanations for this result.

  • 132.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hawkins, Sarah
    University of Cambridge.
    When is fine phonetic detail a detail?: 16th International Congress of Phonetics Sciences2007Ingår i: Proceedings of ICPhS 2007 / [ed] Jürgen Trouvain, William J. Barry, Saarbrücken, Germany, 2007, s. 211-214Konferensbidrag (Refereegranskat)
  • 133.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hirschberg, Julia
    Columbia University.
    Swerts, Marc
    University of Tilburg, The Netherlands.
    Prediction of upcoming Swedish prosodic boundaries by Swedish and American listeners2004Ingår i: Proc of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, s. 329-332Konferensbidrag (Refereegranskat)
    Abstract [en]

    We describe results of a study of perceptually based predictions of upcoming prosodic breaks in spontaneous Swedish speech materials by native speakers of Swedish and of standard American English. The question addressed here is the extent to which listeners are able, on the basis of acoustic and prosodic features, to predict the occurrence of upcoming boundaries, and if so, whether they are able to distinguish different degrees of boundary strength. An experiment was conducted in which spontaneous utterance fragments (both long and short versions) were presented to listeners, who were instructed to guess whether or not the fragments were followed by a prosodic break, and if so, what the strength of the break was, where boundary presence and strength had been independently labeled. Results revealed that both listening groups were indeed able to predict whether or not a boundary (of a particular strength) followed the fragment, suggesting that prosodic rather than lexico-grammatical information was being used as a primary cue.

  • 134. Caudery, T.
    et al.
    Petersen, M.
    Shaw, Philip
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    The Language Environments of Exchange Students at Scandinavian Universities2007Ingår i: Researching Content and Language Integration in Higher Education / [ed] Wilkinson, R.; Zegers, V., University of Maastricht , 2007, s. 233-250Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    Exchange students who come to Scandinavia are often motivated by an intention to improve their proficiency in English rather than the local language. They take academic classes conducted in English and may find themselves living in a lingua-franca English bubble, acculturated to an international-student subculture. A few do break out of the bubble, learn the local language, and experience the local culture. Here we report on a project intended identify the factors leading to successful learning of both English and the local languages. 70 students at each of four institutions, two in Sweden, two in Denmark, were interviewed three times over a semester and asked to complete simple language tests. English proficiency improved in most cases, Swedish/Danish was only learnt by those with good initial English and appropriate motivation. As expected, contact with local students was limited. Institutional policies can probably influence these outcomes.

  • 135. Caudery, T.
    et al.
    Petersen, M.
    Shaw, Philip
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    The motivations of exchange students at Scandinavian universities2008Ingår i: Students, Staff and Academic Mobility in Higher Education / [ed] Byram, M.; Dervin, F., Newcastle: Cambridge Scholars Press , 2008, s. 114-130Kapitel i bok, del av antologi (Refereegranskat)
  • 136.
    Cerrato, Loredana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    A coding scheme for the annotation of feedback phenomena in conversational speech2004Ingår i: Proc of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces / [ed] Martin, J.C., Lisboa, 2004, s. 25-28Konferensbidrag (Refereegranskat)
    Abstract [en]

    A coding scheme specifically developed to label feedback phenomena in conversational speech is presented in this paper. The coding scheme allows the categorization of feedback phenomena according to their typology, direction, and communicative function in the given context. The results of the reliability tests run to verify the appropriateness of the coding scheme to code feedback phenomena in different languages and across different modalities are also presented.

  • 137.
    Cerrato, Loredana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    A comparative study of verbal feedback in Italian and Swedish map-task dialogues2004Ingår i: Proceedings of the Nordic Symposium on the comparison of spoken languages, Copenhagen Working Papers in LSP / [ed] Copenhagen, P.; Hernrichsen, J., 2004, s. 99-126Konferensbidrag (Övrigt vetenskapligt)
  • 138.
    Cerrato, Loredana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Linguistic functions of head nods2005Ingår i: Proceedings from The Second Nordic Conference on Multimodal Communication / [ed] Allwood, J.; Dorriots, B., Göteborg: Göteborg University , 2005, s. 137-152Konferensbidrag (Refereegranskat)
    Abstract [en]

    The aim of the present study is to investigate which communicative functions head nods can have in spoken Swedish. By nod is here meant a vertical down-up movement of the head. To classify the communicative functions of head nods 10 short video-recorded Swedish dialogues were analysed and labeled. The labels used are referred to the different communicative functions that the head nods carry out in the given context. The results show that the most common function carried out by head nods is that of feedback. Beside feedback function, head nods can be produced to signal turn taking, focus and emphasis, to give affirmative responses and to show courtesy. The visual information carried out by head nods in spoken communicative interactions is without doubt extremely important; therefore it should be exploited in the field of human-machine interfaces. This could be done by integrating head nods in the design and development of embodied conversational agents. Thanks to the production of head nods, embodied conversational agents might become more effective and appear more natural during their interactions with human beings.

  • 139.
    Cerrato, Loredana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    On the acoustic, prosodic and gestural characteristics of “m-like” sounds in Swedish2005Ingår i: Feedback in spoken interaction: NordTalk Symposium / [ed] Jens Allwood, Göteborg: Göteborg University , 2005, Vol. Feedback in Spoken Interaction- Nordtalk Symposium 2003, s. 18-31Konferensbidrag (Refereegranskat)
    Abstract [en]

    The aim of the present study is to verify what communicative functions “m-like” sounds can have in spoken Swedish and investigate both the relationship between prosodic variation and communicative function and the relationship between the production of “mlike” sounds and their accompanying gestures. The main hypothesis tested is that the different communicative functions carried by these “m-like” sounds are conveyed by means of different prosodic cues. To test this hypothesis, audio-recordings of two dialogues, elicited with the map-task technique, were used. A distributional and functional analysis of “m-like” sounds was first carried out. Afterwards, an acoustic analysis of these sounds was performed to find out how prosodic variation and communicative function are related. The results show that the most common function carried out by “m-like” sounds is that of feedback. The general category of feedback can be further divided in sub-categories depending on the specific function that the short expression carries out in the given context. To each function it is possible to relate a prototypical F0 contour and acoustic characteristics. For the analysis of the accompanying gestures of “m-like” sounds, two AV recordings of spontaneous dialogues were used. The results of the distributional analysis show that 41% of all the analysed “m-like” sounds are accompanied by a gesture. The most common accompanying gestures are head movement s such as nods and jerks. The relationship between the function carried by speech and the specific function of the accompanying gesture has also been coded and analyzed. Gestures co-occurring with speech can either have a “non-marked/neutral” function, which means that they do not add further information to what is being said with speech, or can be produced to add, emphasize weaken or contradicting speech. When the function of these gestures is neutral, they tend to have a minimal extent, while when their specific function is to emphasize the information expressed by speech, their extent tends to be bigger. This result might be related to the fact that gestures are often produced to emphasize information that is also focused by mechanisms like prosody in speech.

  • 140.
    Cerrato, Loredana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    The communicative function of "sì" in Italian and "ja" in Swedish: an acoustic analysis2005Ingår i: Proceedings of Fonetik 2005 / [ed] Anders Eriksson, Jonas Lindh, Göteborg: Göteborg University , 2005, s. 41-44Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    The results of an acoustic analysis and a perceptual evaluation of the role of prosody inspontaneously produced “ja” and “sì” in Swedish and Italian are reported and discussedin this paper. The hypothesis is that pitch contour, duration cues and relative intensity can beuseful in the identification of the different communicative functions of these short expressions taken out of their context. The results of the perceptual tests run to verify whether the acoustic cues alone can be used to distinguish different functions of the same lexical items are encouraging only for Italian “sí”, while for Swedish “ja” they show some confusions among the different categories.

  • 141.
    Cerrato, Loredana
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Ekeklint, Susanne
    Evaluating users reactions to human-like interfaces: Prosodic and paralinguistic features as new evaluation measures for users satisfaction2004Ingår i: From Brows to Trust: Evaluating Embodied Conversational Agents / [ed] Ruttkay, Z.; Pelachaud, C., Dordrecht: Kluwer Academic Publishers, 2004, s. 101-124Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    An increasing number of dialogue systems are deployed to provide publicservices in our everyday lives. They are becoming more service-minded and several ofthem provide different channels for interaction. The rationale is to make automaticservices available in new environments and more attractive to use. From a developerperspective, this affects the complexity of the requirements elicitation activity, as newcombinations and variations in end-user interaction need to be considered. The aimof our investigation is to propose new parameters and metrics to evaluate multimodaldialogue systems endowed with embodied conversational agents (ECAs). These newmetrics focus on the users, rather than on the system. Our assumption is that theintentional use of prosodic variation and the production of communicative non-verbalbehaviour by users can give an indication of their attitude towards the system andmight also help to evaluate the users’ overall experience of the interaction. To testour hypothesis we carried out analyses on different Swedish corpora of interactionsbetween users and multimodal dialogue systems. We analysed the prosodic variationin the way the users ended their interactions with the system and we observed theproduction of non-verbal communicative expressions by users. Our study supports theidea that the observation of users’ prosodic variation and production of communicativenon-verbal behaviour during the interaction with dialogue systems could be used asan indication of whether or not the users are satisfied with the system performance.

  • 142.
    Cerrato, Loredana
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Svanfeldt, Gunilla
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    A method for the detection of communicative head nods in expressive speech2006Ingår i: Papers from the Second Nordic Conference on Multimodal Communication 2005 / [ed] Allwood, J.; Dorriots, B.; Nicholson, S., Göteborg: Göteborg University , 2006, s. 153-165Konferensbidrag (Refereegranskat)
    Abstract [en]

    The aim of this study is to propose a method for automatic detection of head nods during the production of semi-spontaneous speech. This method also provides means for extracting certain characteristics of head nods, that may vary depending on placement, function and even underlying emotional expression. The material used is part of the Swedish PF-Star corpora which were recorded by means of an optical motion capture system (Qualisys) able to successfully register articulatory movements as well as head movements and facial expressions. The material consists of short sentences as well as of dialogic speech produced by a Swedish actor. The method for automatic head nods detection on the 3D data acquired with Qualisys is based on criteria for slope, amplitude and a minimum number of consecutive frames. The criteria are tuned on head nods that have been manually annotated. These parameters can be varied to detect different kinds of head movements and can also be combined with other parameters in order to detect facial gestures, such as eyebrow displacements. For this study we focused in particular on the detection of head nods, since in earlier studies they have been found to be important visual cues in particular for signaling feedback and focus. In order to evaluate the method a preliminary test was run on semi-spontaneous dialogic speech, which is also part of the Swedish PF-Star corpora and produced by the same actor who read the sentences. The results show that the parameters and the criteria that had been set on the basis of the training corpus are valid also for the dialogic speech, even if more sophisticated parameters could be useful to achieve a more precise result.

  • 143.
    Chau, Ting-Hey
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Translation Memory System Optimization: How to effectively implement translation memory system optimization2015Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Översättning av tekniska manualer är väldigt kostsamt, speciellt när större organisationer behöver publicera produktmanualer för hela deras utbud till över 20 olika språk. När en text (t.ex. en fras, mening, paragraf) har blivit översatt så vill vi kunna återanvända den översatta texten i framtida översättningsprojekt och dokument. De översatta texterna lagras i ett översättningsminne (Translation Memory). Varje text lagras i sitt källspråk tillsammans med dess översättning på ett annat språk, så kallat målspråk. Dessa utgör då ett språkpar i ett översättningsminnessystem (Translation Memory System). Ett språkpar som lagras i ett översättningsminne utgör en Translation Entry även kallat Translation Unit.

    Om man hittar en matchning när man söker på källspråket efter en given textsträng i översättningsminnet, får man upp översättningar på alla möjliga målspråk för den givna textsträngen. Dessa kan i sin tur sättas in i måldokumentet. En sådan funktionalitet erbjuds i publicerings programvaran Skribenta, som har utvecklats av Excosoft.

    För att utföra en översättning till ett målspråk kräver Skribenta att text i källspråket hittar en exakt matchning eller en s.k. full match i översättningsminnet. En full match kan bara uppnås om en text finns lagrad i standardform. Detta kräver manuell taggning av entiteter och ofta förekommande ord som modellnamn och produktnummer.

    I denna uppsats undersöker jag hur man effektivt implementerar en optimering i ett översättningsminnessystem, bland annat genom att underlätta den manuella taggningen av entitier. Detta har gjorts genom olika Heuristiker som angriper problemet med Named Entity Recognition (NER).

    Resultat från de utvecklade Heuristikerna har jämförts med resultatet från det NER-verktyg som har utvecklats av Stanford. Resultaten visar att de Heuristiker som jag utvecklat uppnår ett högre F-Measure jämfört med Stanford NER och kan därför vara ett bra inledande steg för att hjälpa Excosofts användare att förbättra deras översättningsminnen.

  • 144.
    Contardo, Ivonne
    et al.
    Karolinska Institutet, Sweden.
    McAllister, Anita
    Karolinska Institutet, Sweden.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Real-time registration of listener reactions to unintelligibility in misarticulated child speech2014Ingår i: Proceedings from FONETIK 2014 / [ed] Heldner, M., Stockholm, 2014, s. 127-132Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This study explores the relation between misarticulations and their impact on intelligibility. 30 listeners (17 clinicians and 13 untrained listeners) were given the task of clicking a button whenever they perceived something unintelligible during playback of misarticulated child speech samples. No differences were found between the clinicians and the untrained listeners regarding clicking frequency. The distribution of listener clicks correlated strongly with the clinical evaluations of the same samples. The distribution of clicks was also related to manually annotated speech errors, allowing examination of links between events in the speech signal and reactions evoked in listeners. Hereby, we demonstrate a viable approach to ranking speech error types with regards to their impact on intelligibility in conversational speech.

  • 145. Csapo, A.
    et al.
    Gilmartin, E.
    Grizou, J.
    Han, J.
    Meena, Raveesh
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Anastasiou, D.
    Jokinen, K.
    Wilcock, G.
    Multimodal conversational interaction with a humanoid robot2012Ingår i: 3rd IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2012 - Proceedings, IEEE , 2012, s. 667-672Konferensbidrag (Refereegranskat)
    Abstract [en]

    The paper presents a multimodal conversational interaction system for the Nao humanoid robot. The system was developed at the 8th International Summer Workshop on Multimodal Interfaces, Metz, 2012. We implemented WikiTalk, an existing spoken dialogue system for open-domain conversations, on Nao. This greatly extended the robot's interaction capabilities by enabling Nao to talk about an unlimited range of topics. In addition to speech interaction, we developed a wide range of multimodal interactive behaviours by the robot, including face-tracking, nodding, communicative gesturing, proximity detection and tactile interrupts. We made video recordings of user interactions and used questionnaires to evaluate the system. We further extended the robot's capabilities by linking Nao with Kinect.

  • 146. Csapo, A.
    et al.
    Gilmartin, E.
    Grizou, J.
    Han, J.
    Meena, Raveesh
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Anastasiou, D.
    Jokinen, K.
    Wilcock, G.
    Open-Domain Conversation with a NAO Robot2012Ingår i: 3rd International Conference on Cognitive Infocommunications (CogInfoCom 2012), Kosice, 2012Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this demo, we present a multimodal conversationsystem, implemented using a Nao robot and Wikipedia. The system was developed at the 8th International Workshop on Multimodal Interfaces in Metz, France, 2012. The system is based on an interactive, open-domain spoken dialogue systemcalled WikiTalk, which guides the user through conversations based on the link structure of Wikipedia. In addition to speech interaction, the robot interacts with users by tracking their faces and nodding/gesturing at key points of interest within the Wikipedia text. The proximity detection capabilities of the Nao,as well as its tactile sensors were used to implement context-based interrupts in the dialogue system.

  • 147.
    Dabbaghchian, Saeed
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Arnela, Marc
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    SIMPLIFICATION OF VOCAL TRACT SHAPES WITH DIFFERENT LEVELS OF DETAIL2015Ingår i: Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, UK, University of Glasgow , 2015, s. 1-5Konferensbidrag (Refereegranskat)
    Abstract [en]

    We propose a semi-automatic method to regenerate simplified vocal tract geometries from very detailed input (e.g. MRI-based geometry) with the possibility to control the level of detail, while maintaining the overall properties. The simplification procedure controls the number and organization of the vertices in the vocal tract surface mesh and can be assigned to replace complex cross-sections with regular shapes. Six different geometry regenerations are suggested: bent or straight vocal tract centreline, combined with three different types of cross-sections; namely realistic, elliptical or circular. The key feature in the simplification is that the cross-sectional areas and the length of the vocal tract are maintained. This method may, for example, be used to facilitate 3D finite element method simulations of vowels and diphthongs and to examine the basic acoustic characteristics of vocal tract in printed physical replicas. Furthermore, it allows for multimodal solutions of the wave equation.

  • 148.
    Dabbaghchian, Saeed
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Arnela, Marc
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Guasch, Oriol
    Synthesis of VV utterances from muscle activation to sound with a 3d model2017Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, The International Speech Communication Association (ISCA), 2017, s. 3497-3501Konferensbidrag (Refereegranskat)
    Abstract [en]

    We propose a method to automatically generate deformable 3D vocal tract geometries from the surrounding structures in a biomechanical model. This allows us to couple 3D biomechanics and acoustics simulations. The basis of the simulations is muscle activation trajectories in the biomechanical model, which move the articulators to the desired articulatory positions. The muscle activation trajectories for a vowel-vowel utterance are here defined through interpolation between the determined activations of the start and end vowel. The resulting articulatory trajectories of flesh points on the tongue surface and jaw are similar to corresponding trajectories measured using Electromagnetic Articulography, hence corroborating the validity of interpolating muscle activation. At each time step in the articulatory transition, a 3D vocal tract tube is created through a cavity extraction method based on first slicing the geometry of the articulators with a semi-polar grid to extract the vocal tract contour in each plane and then reconstructing the vocal tract through a smoothed 3D mesh-generation using the extracted contours. A finite element method applied to these changing 3D geometries simulates the acoustic wave propagation. We present the resulting acoustic pressure changes on the vocal tract boundary and the formant transitions for the utterance [Ai].

  • 149.
    Dahl, Sofia
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Expressiveness of musician's body movements in performances on marimba2004Ingår i: Gesture-Based Communication in Human-Computer Interaction / [ed] Camurri, A.; Volpe, G., Genoa: Springer Verlag , 2004, s. 479-486Konferensbidrag (Refereegranskat)
    Abstract [en]

    To explore to what extent emotional intentions can be conveyed through musicians’ movements, video recordings were made of amarimba player performing the same piece with the intentions Happy, Sad, Angry and Fearful. 20 subjects were presented video clips, without sound, and asked to rate both the perceived emotional content as well as the movement qualities. The video clips were presented in different conditions, showing the player to different extent. The observers’ ratings forthe intended emotions confirmed that the intentions Happiness, Sadness and Anger were well communicated, while Fear was not. Identification of the intended emotion was only slightly influenced by the viewing condition. The movement ratings indicated that there were cues that the observers used to distinguish between intentions, similar to cues found for audio signals in music performance.

  • 150.
    Dalianis, Hercules
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Data- och systemvetenskap, DSV.
    Knutsson, Ola
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Cerratto Pargman, Teresa
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Data- och systemvetenskap, DSV.
    Using human language technology to support the handling officers at the Swedish Social Insurance Agency2009Ingår i: Design and Evaluation of e-Government Applications and Services: Proceedings of the 2nd International Workshop on Design and Evaluation of e-Government Applications and Services (DEGAS'2009) in conjunction with INTERACT'2009, Uppsala, Sweden, August 24th 2009., 2009, s. 30-32Konferensbidrag (Refereegranskat)
    Abstract [en]

    The Swedish Social Insurance Agency, (Försäkringskassan) receives 40 000 per month as well as phone calls from the citizens that are handled by almost 500 handling officers. To initiate the process to make their work more efficient we carried out two user-centered design workshops with the handling officers at Försäkringskassan with the objective of finding in what ways human language technology might facilitate their work. One of the outcomes from the workshops was that the handling officers required a support tool for handling and answering e-mails from their customers. Three main requirements were identified namely to find the correct template to be used in the e-mail answers, a support to automatically create templates and finally an automatic e-mail answering function. We will during two years focus on these design challenges within the IMAIL-project.

1234567 101 - 150 av 692
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf