Change search
Refine search result
123 101 - 110 of 110
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 101.
    Skantze, Gabriel
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog2006In: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, p. 2002-2005Conference paper (Refereed)
    Abstract [en]

    In a previous study we demonstrated that subjects could use prosodic features (primarily peak height and alignment) to make different interpretations of synthesized fragmentary grounding utterances. In the present study we test the hypothesis that subjects also change their behavior accordingly in a human-computer dialog setting. We report on an experiment in which subjects participate in a color-naming task in a Wizard-of-Oz controlled human-computer dialog in Swedish. The results show that two annotators were able to categorize the subjects' responses based on pragmatic meaning. Moreover, the subjects' response times differed significantly, depending on the prosodic features of the grounding fragment spoken by the system.

  • 102. Strömbergsson, S.
    et al.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Götze, Jana
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. Karolinska Institutet (KI), Sweden.
    Björkenstam, K. N.
    Approximating phonotactic input in children's linguistic environments from orthographic transcripts2017In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, International Speech Communication Association , 2017, Vol. 2017, p. 2213-2217Conference paper (Refereed)
    Abstract [en]

    Child-directed spoken data is the ideal source of support for claims about children's linguistic environments. However, phonological transcriptions of child-directed speech are scarce, compared to sources like adult-directed speech or text data. Acquiring reliable descriptions of children's phonological environments from more readily accessible sources would mean considerable savings of time and money. The first step towards this goal is to quantify the reliability of descriptions derived from such secondary sources. We investigate how phonological distributions vary across different modalities (spoken vs. written), and across the age of the intended audience (children vs. adults). Using a previously unseen collection of Swedish adult-and child-directed spoken and written data, we combine lexicon look-up and graphemeto-phoneme conversion to approximate phonological characteristics. The analysis shows distributional differences across datasets both for single phonemes and for longer phoneme sequences. Some of these are predictably attributed to lexical and contextual characteristics of text vs. speech. The generated phonological transcriptions are remarkably reliable. The differences in phonological distributions between child-directed speech and secondary sources highlight a need for compensatory measures when relying on written data or on adult-directed spoken data, and/or for continued collection of actual child-directed speech in research on children's language environments.

  • 103.
    Strömbergsson, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A study of Swedish questions and their prosodic characteristics2012In: Proceedings of Workshop on Innovation and Applications in Speech Technology (IAST), Dublin, Ireland, 2012, p. 61-64Conference paper (Refereed)
    Abstract [en]

    Studies of questions present strong evidence that there is no one-to-one relationship between intonation and interrogative mode. This is one of the fundamental observations that formed the VariQ project, a Swedish national project aiming for a deeper insight in how questions work and indeed what constitutes a question. Here, we present some intermediate project results and outline the way ahead. VariQ looks mainly at the Spontal corpus of conversational speech [1], but exploits other data sets to a limited extent for comparative purposes. We report a recent study, in which we selected 600 questions from the Spontal corpus and annotated these in a theory-independent manner. In a subsequent study we compared some prosodic characteristics of these questions with those of the speech used in seven Swedish spoken dialogue systems. The results reveal differences both in the distributions of question types and in prosodic characteristics of the questions in the two different settings.

  • 104.
    Strömbergsson, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Prosodic measurements and question types in the Spontal corpus of Swedish dialogues2012In: 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, p. 838-841Conference paper (Refereed)
    Abstract [en]

    Studies of questions present strong evidence that there is no one-to-one relationship between intonation and interrogative mode. In this paper, we describe some aspects of prosodic variation in the Spontal corpus of 120 half-hour spontaneous dialogues in Swedish. The study is part of ongoing work aimed at extracting a database of 600 questions from the corpus, complete with categorization and prosodic descriptions. We report on coding and annotation of question typology and present results concerning some prosodic correlates related to question type for the 600 questions. A prosodically salient distinction was found between the two categories termed, in our typology, forward and backward looking questions.

  • 105.
    Strömbergsson, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Question types and some prosodic correlates in the Spontal corpus of Swedish dialogues2012In: Proceedings of Fonetik 2012, Gothenburg, Sweden, 2012Conference paper (Refereed)
    Abstract [en]

    We describe some aspects of prosodic variation in the Spontal corpus of 120 half-hour spontaneous dialogues in Swedish. Coding and annotation of question typology is described and results are presented concerning prosodic correlates related to question type for well over 400 questions. A prosodically salient distinction was found between the two categories termed, in our typology, forward- and backward-looking questions.

  • 106.
    Strömbergsson, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Questions and reported speech in Swedish dialogues2012In: Nordic Prosody: Proceedings of the XIth Conference, Tartu 2012, Tartu, Estonia, 2012Conference paper (Refereed)
    Abstract [en]

    The Spontal corpus contains 120 half-hour spontaneous dialogues in Swedish, balanced for gender combination and previous acquaintance. From this corpus, over 600 questions have been extracted with a view to investigate and describe prosodic variation and to test the hypothesis that a standard type of question intonation such as a final pitch rise which contrasts to a final low of declarative intonation is not consistent with the pragmatic use of intonation in dialogue [1]. The extracted questions have been labeled with respect to four relatively simple queries, Q1-Q4. Q1 is related to question type, whether the question is best described as a y/n question (Y/N), a wh-question (WH), an alternative question which includes a restricted set of alternative answers (ALT), or other (OTHER). Q2 concerned whether a response was required (REQUIRED), possible (OPTIONAL), or prohibited (PROHIBITED). Q3 was labeled with respect to the question "Does the person asking the question ask for something that has not already been said (FORWARD) or is it more a question of verifying or showing attitude towards what has already been stated (BACKWARD)?". Q4 was to be answered in the positive if the question was a case of reported speech (REPORTED), and in the negative if not (DIRECT). Clustering across the different labels enables the extraction of specific question categories – categories that presumably correspond to different prosodic characteristics. One category of questions that has been relatively underexplored is reported questions. In [2], for example, reported questions were not coded as questions. Here, we present typological, lexical and prosodic characteristics of reported questions in the Spontal corpus. This set of questions is relatively small (N = 48), and statistic analyses were not very revealing; analyses comparing prosodic variation (duration, average pitch, pitch variation and intonation slope) within speakers did not show any differences dependent on whether questions were reported or not. Instead, our prosodic analysis is based on qualitative descriptions of the reported questions – descriptions that apart from prosodic characteristics (pitch level and variation, speed, intensity) involve aspects of voice quality and articulation. We were surprised to find that prosodic marking was not very frequent in the data; around half of the reported questions were not perceived as being prosodically marked. The reported questions were subcategorized by a) whether the question had actually been produced or not, and b) whether the reported question had been produced by the speaker herself or by someone else. Prosodic marking was found to be more common in reported questions that had actually been produced than in hypothetical/rhetorical questions, and also more common in reported questions that had been produced by someone else than by the speaker herself. Regarding the characteristics of the prosodically marked questions, we found that seemingly opposite features can be used to signal reportedness: e.g. either slower or faster speech, higher or lower pitch, stronger or softer speech. Prosodic difference to the preceding context seems to be the key. Apart from the prosodic signaling of reportedness, we also noted lexical signals (where "bara" [ba], Eng. "like:" is the most common) and semantic/pragmatic cues, where the question does not make any sense unless it is a case of reported speech. Regarding the distribution of question types within the group of reported questions, FORWARD-looking YN and WH questions that REQUIRE an answer are the most common type, just as for the non-reported questions in the Spontal corpus.

  • 107.
    Strömbergsson, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Timing responses to questions in dialogue2013In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France: International Speech and Communication Association , 2013, p. 2584-2588Conference paper (Refereed)
    Abstract [en]

    Questions and answers play an important role in spoken dialogue systems as well as in human-human interaction. A critical concern when responding to a question is the timing of the response. While human response times depend on a wide set of features, dialogue systems generally respond as soon as they can, that is, when the end of the question has been detected and the response is ready to be deployed. This paper presents an analysis of how different semantic and pragmatic features affect the response times to questions in two different data sets of spontaneous human-human dialogues: the Swedish Spontal Corpus and the US English Switchboard corpus. Our analysis shows that contextual features such as question type, response type, and conversation topic influence human response times. Based on these results, we propose that more sophisticated response timing can be achieved in spoken dialogue systems by using these features to automatically and deliberately target system response timing.

  • 108.
    Strömbergsson, Sofia
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tånnander, C.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Ranking severity of speech errors by their phonological impact in context2014In: Interspeech, ISSN 2308-457X, p. 1568-1572Article in journal (Refereed)
    Abstract [en]

    Children with speech disorders often present with systematic speech error patterns. In clinical assessments of speech disorders, evaluating the severity of the disorder is central. Current measures of severity have limited sensitivity to factors like the frequency of the target sounds in the child’s language and the degree of phonological diversity, which are factors that can be assumed to affect intelligibility. By constructing phonological filters to simulate eight speech error patterns often observed in children, and applying these filters to a phonologically transcribed corpus of 350K words, this study explores three quantitative measures of phonological impact: Percentage of Consonants Correct (PCC), edit distance, and degree of homonymy. These metrics were related to estimated ratings of severity collected from 34 practicing clinicians. The results show an expected high correlation between the PCC and edit distance metrics, but that none of the three metrics align with clinicians’ ratings. Although these results do not generate definite answers to what phonological factors contribute the most to (un)intelligibility, this study demonstrates a methodology that allows for large-scale investigations of the interplay between phonological errors and their impact on speech in context, within and across languages.

  • 109.
    Wallers, Åsa
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    The effect of prosodic features on the interpretation of synthesised backchannels2006In: Perception And Interactive Technologies, Proceedings / [ed] Andre, E; Dybkjaer, L; Minker, W; Neumann, H; Weber, M, 2006, Vol. 4021, p. 183-187Conference paper (Refereed)
    Abstract [en]

    A study of the interpretation of prosodic features in backchannels (Swedish /a/ and /m/) produced by speech synthesis is presented. The study is part of work-in-progress towards endowing conversational spoken dialogue systems with the ability to produce and use backchannels and other feedback.

  • 110. Włodarczak, M.
    et al.
    Heldner, M.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Communicative needs and respiratory constraints2015In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, 2015, p. 3051-3055Conference paper (Refereed)
    Abstract [en]

    This study investigates timing of communicative behaviour with respect to speaker's respiratory cycle. The data is drawn from a corpus of multiparty conversations in Swedish. We find that while longer utterances (> 1 s) are tied, predictably, primarily to exhalation onset, shorter vocalisations are spread more uniformly across the respiratory cycle. In addition, nods, which are free from any respiratory constraints, are most frequently found around exhalation offsets, where respiratory requirements for even a short utterance are not satisfied. We interpret the results to reflect the economy principle in speech production, whereby respiratory effort, associated primarily with starting a new respiratory cycle, is minimised within the scope of speaker's communicative goals. Copyright

123 101 - 110 of 110
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf