Change search
Refine search result
1 - 31 of 31
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Bertenstam, Johan
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Mats, Blomberg
    KTH, Superseded Departments, Speech, Music and Hearing.
    Carlson, Rolf
    KTH, Superseded Departments, Speech, Music and Hearing.
    Elenius, Kjell
    KTH, Superseded Departments, Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments, Speech, Music and Hearing.
    Gustafson, Joakim
    KTH, Superseded Departments, Speech, Music and Hearing.
    Hunnicutt, Sheri
    Högberg, Jesper
    KTH, Superseded Departments, Speech, Music and Hearing.
    Lindell, Roger
    KTH, Superseded Departments, Speech, Music and Hearing.
    Neovius, Lennart
    KTH, Superseded Departments, Speech, Music and Hearing.
    Nord, Lennart
    de Serpa-Leitao, Antonio
    KTH, Superseded Departments, Speech, Music and Hearing.
    Ström, Nikko
    KTH, Superseded Departments, Speech, Music and Hearing.
    Spoken dialogue data collected in the Waxholm project1995In: Quarterly progress and status report: April 15, 1995 /Speech Transmission Laboratory, Stockholm: KTH , 1995, 1, p. 50-73Chapter in book (Other academic)
  • 2.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Multimodal Interaction Control2009In: Computers in the Human Interaction Loop / [ed] Waibel, Alexander; Stiefelhagen, Rainer, Berlin/Heidelberg: Springer Berlin/Heidelberg, 2009, p. 143-158Chapter in book (Refereed)
  • 3. Biadsy, F.
    et al.
    Rosenberg, A.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hirschberg, J.
    Strangert, E.
    A Cross-Cultural Comparison of American, Palestinian, and Swedish2008In: Speech Prosody 2008, Campinas, Brazil, 2008Conference paper (Refereed)
    Abstract [en]

    Perception of charisma, the ability to influence others by virtueof one’s personal qualities, appears to be influenced to someextent by cultural factors. We compare results of five studies of charisma speech in which American, Palestinian, andSwedish subjects rated Standard American English politicalspeech and Americans and Palestinians rated Palestinian Arabic speech. We identify acoustic-prosodic and lexical featurescorrelated with charisma ratings of both languages for nativeand non-native speakers and find that 1) some acoustic-prosodicfeatures correlated with charisma ratings appear similar acrossall five experiments; 2) other acoustic-prosodic and lexical features correlated with charisma appear specific to the languagerated, whatever the native language of the rater; and 3) stillother acoustic-prosodic cues appear specific to both rater nativelanguage and to language rated. We also find that, while theabsolute ratings non-native raters assign tend to be lower thanthose of native speakers, the ratings themselves are strongly correlated.

  • 4. Biadsy, F.
    et al.
    Rosenberg, A.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Hirschberg, J.
    Strangert, E.
    A cross-cultural comparison of American, Palestinian, and Swedish perception of charismatic speech2008In: Proceedings of the 4th International Conference on Speech Prosody, SP 2008, International Speech Communications Association , 2008, p. 579-582Conference paper (Refereed)
    Abstract [en]

    Perception of charisma, the ability to influence others by virtue of one's personal qualities, appears to be influenced to some extent by cultural factors. We compare results of five studies of charisma speech in which American, Palestinian, and Swedish subjects rated Standard American English political speech and Americans and Palestinians rated Palestinian Arabic speech. We identify acoustic-prosodic and lexical features correlated with charisma ratings of both languages for native and non-native speakers and find that 1) some acoustic-prosodic features correlated with charisma ratings appear similar across all five experiments; 2) other acoustic-prosodic and lexical features correlated with charisma appear specific to the language rated, whatever the native language of the rater; and 3) still other acoustic-prosodic cues appear specific to both rater native language and to language rated. We also find that, while the absolute ratings non-native raters assign tend to be lower than those of native speakers, the ratings themselves are strongly correlated.

  • 5. Boves, L.
    et al.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hinrichs, E.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Krauwer, S.
    Lemnitzer, L.
    Vainio, M.
    Wittenburg, P.
    Resources for Speech Research: Present and Future Infrastructure Needs2009In: Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, UK, 2009, p. 1803-1806Conference paper (Refereed)
    Abstract [en]

    This paper introduces the EU-FP7 project CLARIN, a joint effort of over 150 institutions in Europe, aimed at the creation of a sustainable language resources and technology infrastructure for the humanities and social sciences research community. The paper briefly introduces the vision behind the project and how it relates to speech research with a focus on the contributions that CLARIN can and will make to research in spoken language processing.

  • 6.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Conflicting acoustic cues in stop perception2007In: Where Do Features Come From ?: Phonological Primitives in the Brain, the Mouth, and the Ear, 2007, p. 63-64Conference paper (Refereed)
  • 7.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Using acoustic cues in stop perception2007In: Proceedings of Fonetik 2007, 2007, Vol. 50, no 1, p. 25-28Conference paper (Other academic)
  • 8.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Towards human-like behaviour in spoken dialog systems2006In: Proceedings of Swedish Language Technology Conference (SLTC 2006), Gothenburg, Sweden, 2006Conference paper (Other academic)
    Abstract [en]

    We and others have found it fruitful to assume that users, when interacting with spoken dialogue systems, perceive the systems and their actions metaphorically. Common metaphors include the human metaphor and the interface metaphor (cf. Edlund, Heldner, & Gustafson, 2006). In the interface metaphor, the spoken dialogue system is perceived as a machine interface – often but not always a computer interface. Speech is used to accomplish what would have otherwise been accomplished by some other means of input, such as a keyboard or a mouse. In the human metaphor, on the other hand, the computer is perceived as a creature (or even a person) with humanlike conversational abilities, and speech is not a substitute or one of many alternatives, but rather the primary means of communicating with this creature. We are aware that more “natural ” or human-like behaviour does not automatically make a spoken dialogue system “better ” (i.e. more efficient or more well-liked by its users). Indeed, we are quite convinced that the advantage (or disadvantage) of humanlike behaviour will be highly dependent on the application. However, a dialogue system that is coherent with a human metaphor may profit from a number of characteristics.

  • 9.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Elenius, Kjell
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Swerts, Marc
    Tilburg University, The Netherlands.
    Perceptual judgments of pitch range2004In: Proc. of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, p. 689-692Conference paper (Refereed)
    Abstract [en]

    This paper reports on a study that explores to what extent listeners are able to judge where a particular utterance fragment is located in a speaker's pitch range. The research consists of a perception study that makes use of 100 stimuli, selected from 50 different speakers whose speech was originally collected for a multi-speaker database of Swedish speech materials. The fragments are presented to subjects whom are asked to estimate whether the fragment is located in the lower or higher part of that speaker's range. Results reveal that listeners' judgments are dependent on the gender of the speaker, but that within a gender they tend to hear differences in range.

  • 10.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granstrom, Bjorn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Johan Liljencrants (1936-2012) in memoriam2012In: Journal of the International Phonetic Association, ISSN 0025-1003, E-ISSN 1475-3502, Vol. 42, no 2, p. 253-254Article in journal (Refereed)
  • 11.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Data-driven multimodal synthesis2005In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 47, no 02-jan, p. 182-193Article in journal (Refereed)
    Abstract [en]

    This paper is a report on current efforts at the Department of Speech, Music and Hearing, KTH, on data-driven multimodal synthesis including both visual speech synthesis and acoustic modeling. In the research we try to combine both corpus based methods with knowledge based models and to explore the best of the two approaches. In the paper an attempt to build formant-synthesis systems based on both rule-generated and database driven methods is presented. A pilot experiment is also reported showing that this approach can be a very interesting path to explore further. Two studies on visual speech synthesis are reported, one on data acquisition using a combination of motion capture techniques and one concerned with coarticulation, comparing different models.

  • 12.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Rule-based Speech Synthesis2008In: Springer Handbook of Speech Processing / [ed] Benesty, J.; Sondhi, M. M.; Huang, Y., Berlin/Heidelberg: Springer Berlin/Heidelberg, 2008, p. 429-436Chapter in book (Refereed)
    Abstract [en]

    In this chapter, we review some of the issues in rule-based synthesis and specifically discuss formant synthesis. Formant synthesis and the theory behind have played an important role in both the scientific progress in understanding how humans talk and also the development of the first speech technology applications. Its flexibility and small footprint makes the approach still of interest and a valuable complement to the current dominant methods based on concatenative data-driven synthesis. As already mentioned in the overview by Schroeter (Chap. 19) we also see a new trend to combine the rule-based and data-driven approaches. Formant features from a database that can be used both to optimize a rule-based formant synthesis system and to optimize the search for good units in a concatenative system.

  • 13.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Speech Synthesis2010In: The Handbook of Phonetic Sciences, Blackwell Publishing, 2010, 2, p. 781-803Chapter in book (Refereed)
  • 14.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Lindblom, Björn
    Risberg, Arne
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Gunnar Fant 1920-2009 In Memoriam2009In: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 66, no 4, p. 249-250Article in journal (Refereed)
  • 15.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Kjell
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Exploring Data Driven Parametric Synthesis2009In: Proceedings of Fonetik 2009: The XXIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm, Sweden: Stockholm University, 2009, p. 86-91Conference paper (Other academic)
    Abstract [en]

    This paper describes our work on building aformant synthesis system based on both rule generated and database driven methods. Three parametric synthesis systems are discussed: our traditional rule based system, a speaker adapted system, and finally a gesture system.The gesture system is a further development of the adapted system in that it includes concatenated formant gestures from a data-driven unit library. The systems are evaluated technically, comparing the formant tracks with an analysed test corpus. The gesture system results in a 25% error reduction in the formant frequencies due to the inclusion of the stored gestures. Finally, a perceptual evaluation shows a clear advantage in naturalness for the gesture system compared to both the traditional system and the speaker adapted system.

  • 16.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Kjell
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Strangert, E.
    Synthesising disfluencies in a dialogue system2009In: Nordic Prosody: Proccedings of the Xth Conference / [ed] Vainio, M., Aulanko, R., Aaltonen, O., Frankfurt am Main: Peter Lang Publishing Group, 2009Conference paper (Refereed)
  • 17.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Kjell
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Strangert, Eva
    Cues for Hesitation in Speech Synthesis2006In: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, p. 1300-1303Conference paper (Refereed)
    Abstract [en]

    The current study investigates acoustic correlates to perceived hesitation based on previous work showing that pause duration and final lengthening both contribute to the perception of hesitation. It is the total duration increase that is the valid cue rather than the contribution by either factor. The present experiment using speech synthesis was designed to evaluate F0 slope and presence vs. absence of creaky voice before the inserted hesitation in addition to durational cues. The manipulations occurred in two syntactic positions, within a phrase and between two phrases, respectively. The results showed that in addition to durational increase, variation of both F0 slope and creaky voice had perceptual effects, although to a much lesser degree. The results have a bearing on efforts to model spontaneous speech including disfluencies, to be explored, for example, in spoken dialogue systems.

  • 18.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Kjell
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Strangert, Eva
    Umeå University.
    Prosodic Cues for Hesitation2006In: Working Papers 52: Proceedings from Fonetik 2006 / [ed] Gilbert Ambrazaitis, Susanne Schötz, Lund: Lund University , 2006, Vol. 52, p. 21-24Conference paper (Other academic)
    Abstract [en]

    In our efforts to model spontaneous speech for use in, for example, spoken dialogue systems, a series of experiments have been conducted in order to investigate correlates to perceived hesitation. Previous work has shown that it is the total duration increase that is the valid cuerather than the contribution by either of the two factors pause duration and final lengthening. In the present experiment we explored the effects of F0 slope variation and the presence vs. absence of creaky voice in addition to durational cues, using synthetic stimuli. The results showed that variation of both F0 slope and creaky voice did have perceptual effects, but to amuch lesser degree than the durational increase.

  • 19.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafsson, Kjell
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Strangert, Eva
    Umeå University.
    Modelling hesitation for synthesis of spontaneous speech2006In: Proceedings of Speech Prosody 2006 / [ed] R. Hoffmann, H. Mixdorff, Dresden, 2006Conference paper (Refereed)
    Abstract [en]

    The current work deals with the modelling of one type of disfluency, hesitations. A perceptual experiment using speech synthesis was designed to evaluate two duration features found to be correlates to hesitation, pause duration and final lengthening. A variation of F0 slope before the hesitation wasalso included. The most important finding is that it is the totalduration increase that is the valid cue rather than the contribution by either factor. In addition, our findings lead us to assume an interaction with syntax. The absence of strong effects of the induced F0 variation was unexpected and we consider several possible explanations for this result.

  • 20.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hawkins, Sarah
    University of Cambridge.
    When is fine phonetic detail a detail?: 16th International Congress of Phonetics Sciences2007In: Proceedings of ICPhS 2007 / [ed] Jürgen Trouvain, William J. Barry, Saarbrücken, Germany, 2007, p. 211-214Conference paper (Refereed)
  • 21.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hirschberg, J.
    Swerts, M.
    Cues to upcoming Swedish prosodic boundaries: Subjective judgment studies and acoustic correlates2005In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 46, no 3-4, p. 326-333Article in journal (Refereed)
    Abstract [en]

    Studies of perceptually based predictions of upcoming prosodic boundaries in spontaneous Swedish speech, both by native speakers of Swedish and of native speakers of standard American English reveal marked similarity in judgments. We examined whether Swedish and American listeners were able to predict the occurrence and strength of upcoming boundaries in a series of web-based perceptive experiments. Utterance fragments (in both long and short versions) were selected from a corpus of spontaneous Swedish speech, which was first labeled for boundary presence and strength by expert labelers. These fragments were then presented to listeners, who were instructed to guess whether or not they were followed by a prosodic break, and if so, what the strength of the break was. Results revealed that both Swedish and American listening groups were indeed able to predict whether or not a boundary (of a particular strength) followed the fragment. This suggests that acoustic and prosodic, rather than lexico-grammatical and semantic information was being used by listeners as a primary cue. Acoustic and prosodic correlates of these judgments were then examined, with significant correlations found between judgments and the presence/absence of final creak and phrase-final f0 level and slope.

  • 22.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Hirschberg, Julia
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Cross-Cultural Perception of Discourse Phenomena2009In: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, p. 1723-1726Conference paper (Refereed)
    Abstract [en]

    We discuss perception studies of two low level indicators of discourse phenomena by Swedish. Japanese, and Chinese native speakers. Subjects were asked to identify upcoming prosodic boundaries and disfluencies in Swedish spontaneous speech. We hypothesize that speakers of prosodically unrelated languages should be less able to predict upcoming phrase boundaries but potentially better able to identify disfluencies, since indicators of disfluency are more likely to depend upon lexical, as well as acoustic information. However, surprisingly, we found that both phenomena were fairly well recognized by native and non-native speakers, with, however, some possible interference from word tones for the Chinese subjects.

  • 23.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hirschberg, Julia
    Columbia University.
    Swerts, Marc
    University of Tilburg, The Netherlands.
    Prediction of upcoming Swedish prosodic boundaries by Swedish and American listeners2004In: Proc of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, p. 329-332Conference paper (Refereed)
    Abstract [en]

    We describe results of a study of perceptually based predictions of upcoming prosodic breaks in spontaneous Swedish speech materials by native speakers of Swedish and of standard American English. The question addressed here is the extent to which listeners are able, on the basis of acoustic and prosodic features, to predict the occurrence of upcoming boundaries, and if so, whether they are able to distinguish different degrees of boundary strength. An experiment was conducted in which spontaneous utterance fragments (both long and short versions) were presented to listeners, who were instructed to guess whether or not the fragments were followed by a prosodic break, and if so, what the strength of the break was, where boundary presence and strength had been independently labeled. Results revealed that both listening groups were indeed able to predict whether or not a boundary (of a particular strength) followed the fragment, suggesting that prosodic rather than lexico-grammatical information was being used as a primary cue.

  • 24.
    Edlund, Jens
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Skantze, Gabriel
    KTH, Superseded Departments, Speech, Music and Hearing.
    Carlson, Rolf
    KTH, Superseded Departments, Speech, Music and Hearing.
    Higgins: a spoken dialogue system for investigating error handling techniques2004In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 04, 2004, p. 229-231Conference paper (Refereed)
    Abstract [en]

    In this paper, an overview of the Higgins project and the research within the project is presented. The project incorporates studies of error handling for spoken dialogue systems on several levels, from processing to dialogue level. A domain in which a range of different error types can be studied has been chosen: pedestrian navigation and guiding. Several data collections within Higgins have been analysed along with data from Higgins' predecessor, the AdApt system. The error handling research issues in the project are presented in light of these analyses.

  • 25.
    Heldner, Mattias
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Interruption impossible2006In: Nordic Prosody: Proceedings of the IXth Conference, Lund 2004 / [ed] Bruce, G.; Horne, M., Frankfurt am Main, Germany, 2006, p. 97-105Conference paper (Refereed)
    Abstract [en]

    Most current work on spoken human-computer interaction has so far concentrated on interactions between a single user and a dialogue system. The advent of ideas of the computer or dialogue system as a conversational partner in a group of humans, for example within the CHIL-project1 and elsewhere (e.g. Kirchhoff & Ostendorf, 2003), introduces new requirements on the capabilities of the dialogue system. Among other things, the computer as a participant in a multi-part conversation has to appreciate the human turn-taking system, in order to time its' own interjections appropriately. As the role of a conversational computer is likely to be to support human collaboration, rather than to guide or control it, it is particularly important that it does not interrupt or disturb the human participants. The ultimate goal of the work presented here is to predict suitable places for turn-takings, as well as positions where it is impossible for a conversational computer to interrupt without irritating the human interlocutors.

  • 26.
    Johnson-Roberson, Matthew
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bohg, Jeannette
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Enhanced visual scene understanding through human-robot dialog2010In: Dialog with Robots: AAAI 2010 Fall Symposium, 2010, p. -144Conference paper (Refereed)
  • 27.
    Johnson-Roberson, Matthew
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bohg, Jeannette
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Gustafsson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Rasolzadeh, Babak
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Enhanced Visual Scene Understanding through Human-Robot Dialog2011In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE , 2011, p. 3342-3348Conference paper (Refereed)
    Abstract [en]

    We propose a novel human-robot-interaction framework for robust visual scene understanding. Without any a-priori knowledge about the objects, the task of the robot is to correctly enumerate how many of them are in the scene and segment them from the background. Our approach builds on top of state-of-the-art computer vision methods, generating object hypotheses through segmentation. This process is combined with a natural dialog system, thus including a ‘human in the loop’ where, by exploiting the natural conversation of an advanced dialog system, the robot gains knowledge about ambiguous situations. We present an entropy-based system allowing the robot to detect the poorest object hypotheses and query the user for arbitration. Based on the information obtained from the human-robot dialog, the scene segmentation can be re-seeded and thereby improved. We present experimental results on real data that show an improved segmentation performance compared to segmentation without interaction.

  • 28. Lacerda, F.
    et al.
    Sundberg, U.
    Carlson, Rolf
    KTH, Superseded Departments, Speech, Music and Hearing.
    Holt, L.
    Modelling interactive language learning: a project presentation2004In: Proceedings of The XVIIth Swedish Phonetics Conference, Fonetik 2004, Stockholm University, 2004, p. 60-63Conference paper (Other academic)
    Abstract [en]

    This paper describes a recently started inter-disciplinary research program aiming at inves-tigating and modelling fundamental aspects of the language acquisition process. The working hypothesis assumes that general purpose per-ception and memory processes, common to both human and other mammalian species, along with the particular context of initial adult-infant interaction, underlie the infant’s ability to progressively derive linguistic struc-ture implicitly available in the ambient lan-guage. The project is conceived as an interdis-ciplinary research effort involving the areas of Phonetics, Psychology and Speech recognition. Experimental speech perception techniques will be used at Dept. of Linguistics, SU, to investi-gate the development of the infant’s ability to derive linguistic information from situated con-nected speech. These experiments will be matched by behavioural tests of animal sub-jects, carried out at CMU, Pittsburgh, to dis-close the potential significance that recurrent multi-sensory properties of the stimuli may have for spontaneous category formation. Data from infant and child vocal productions as well as infant-adult interactions will also be col-lected and analyzed to address the possibility of a production-perception link. Finally, the data from the infant and animal studies will be inte-grated and tested in mathematical models of the language acquisition process, developed at TMH, KTH.

  • 29.
    Skantze, Gabriel
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Talking with Higgins: Research challenges in a spoken dialogue system2006In: PERCEPTION AND INTERACTIVE TECHNOLOGIES, PROCEEDINGS / [ed] Andre, E; Dybkjaer, L; Minker, W; Neumann, H; Weber, M, BERLIN: SPRINGER-VERLAG BERLIN , 2006, Vol. 4021, p. 193-196Conference paper (Refereed)
    Abstract [en]

    This paper presents the current status of the research in the Higgins project and provides background for a demonstration of the spoken dialogue system implemented within the project. The project represents the latest development in the ongoing dialogue systems research at KTH. The practical goal of the project is to build collaborative conversational dialogue systems in which research issues such as error handling techniques can be tested empirically.

  • 30. Strangert, E.
    et al.
    Carlson, Rolf
    KTH, Superseded Departments, Speech, Music and Hearing.
    On the modelling and synthesis of conversational speech2004In: Nordic Prosody: Proceedings of the IXth Conference / [ed] Bruce, G.; Horne, M., Lund: Peter Lang: Frankfurt am Main , 2004, p. 255-264Conference paper (Refereed)
  • 31. Öhlin, David
    et al.
    Carlson, Rolf
    KTH, Superseded Departments, Speech, Music and Hearing.
    Data-driven formant synthesis2004In: Proceedings FONETIK 2004: The XVIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm University, 2004, p. 160-163Conference paper (Other academic)
1 - 31 of 31
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf