Endre søk
Begrens søket
123 1 - 50 of 145
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Al Moubayed, Samer
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Öster, Ann-Marie
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Salvi, Giampiero
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    van Son, Nic
    Ormel, Ellen
    Virtual Speech Reading Support for Hard of Hearing in a Domestic Multi-Media Setting2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 1443-1446Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we present recent results on the development of the SynFace lip synchronized talking head towards multilinguality, varying signal conditions and noise robustness in the Hearing at Home project. We then describe the large scale hearing impaired user studies carried out for three languages. The user tests focus on measuring the gain in Speech Reception Threshold in Noise when using SynFace, and on measuring the effort scaling when using SynFace by hearing impaired people. Preliminary analysis of the results does not show significant gain in SRT or in effort scaling. But looking at inter-subject variability, it is clear that many subjects benefit from SynFace especially with speech with stereo babble noise.

  • 2.
    Al Moubayed, Samer
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    De Smet, Michael
    Van Hamme, Hugo
    Lip Synchronization: from Phone Lattice to PCA Eigen-projections using Neural Networks2008Inngår i: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008, s. 2016-2019Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Lip synchronization is the process of generating natural lip movements from a speech signal. In this work we address the lip-sync problem using an automatic phone recognizer that generates a phone lattice carrying posterior probabilities. The acoustic feature vector contains the posterior probabilities of all the phones over a time window centered at the current time point. Hence this representation characterizes the phone recognition output including the confusion patterns caused by its limited accuracy. A 3D face model with varying texture is computed by analyzing a video recording of the speaker using a 3D morphable model. Training a neural network using 30 000 data vectors from an audiovisual recording in Dutch resulted in a very good simulation of the face on independent data sets of the same or of a different speaker.

  • 3.
    Alexanderson, Simon
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Székely, Éva
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Henter, Gustav Eje
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH. KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL.
    Kucherenko, Taras
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL.
    Beskow, Jonas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Generating coherent spontaneous speech and gesture from text2020Inngår i: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020, Association for Computing Machinery (ACM) , 2020Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Embodied human communication encompasses both verbal (speech) and non-verbal information (e.g., gesture and head movements). Recent advances in machine learning have substantially improved the technologies for generating synthetic versions of both of these types of data: On the speech side, text-to-speech systems are now able to generate highly convincing, spontaneous-sounding speech using unscripted speech audio as the source material. On the motion side, probabilistic motion-generation methods can now synthesise vivid and lifelike speech-driven 3D gesticulation. In this paper, we put these two state-of-the-art technologies together in a coherent fashion for the first time. Concretely, we demonstrate a proof-of-concept system trained on a single-speaker audio and motion-capture dataset, that is able to generate both speech and full-body gestures together from text input. In contrast to previous approaches for joint speech-and-gesture generation, we generate full-body gestures from speech synthesis trained on recordings of spontaneous speech from the same person as the motion-capture data. We illustrate our results by visualising gesture spaces and textspeech-gesture alignments, and through a demonstration video.

  • 4. Ambrazaitis, G.
    et al.
    Frid, J.
    House, David
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Word prominence ratings in Swedish television news readings: Effects of pitch accents and head movements2020Inngår i: Proceedings of the International Conference on Speech Prosody, International Speech Communication Association , 2020, Vol. 2020, s. 314-318Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Prosodic prominence is a multimodal phenomenon where pitch accents are frequently aligned with visible movements by the hands, head, or eyebrows. However, little is known about how such movements function as visible prominence cues in multimodal speech perception with most previous studies being restricted to experimental settings. In this study, we are piloting the acquisition of multimodal prominence ratings for a corpus of natural speech (Swedish television news readings). Sixteen short video clips (218 words) of news readings were extracted from a larger corpus and rated by 44 native Swedish adult volunteers using a web-based set-up. The task was to rate each word in a clip as either non-prominent, moderately prominent or strongly prominent based on audiovisual cues. The corpus was previously annotated for pitch accents and head movements. We found that words realized with a pitch accent and head movement tended to receive higher prominence ratings than words with a pitch accent only. However, we also examined ratings for a number of carefully selected individual words, and these case studies suggest that ratings are affected by complex relations between the presence of a head movement and its type of alignment, the word's F0 profile, and semantic and pragmatic factors.

  • 5. Ambrazaitis, G.
    et al.
    House, David
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Multimodal prominences: Exploring the patterning and usage of focal pitch accents, head beats and eyebrow beats in Swedish television news readings2017Inngår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 95, s. 100-113Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Facial beat gestures align with pitch accents in speech, functioning as visual prominence markers. However, it is not yet well understood whether and how gestures and pitch accents might be combined to create different types of multimodal prominence, and how specifically visual prominence cues are used in spoken communication. In this study, we explore the use and possible interaction of eyebrow (EB) and head (HB) beats with so-called focal pitch accents (FA) in a corpus of 31 brief news readings from Swedish television (four news anchors, 986 words in total), focusing on effects of position in text, information structure as well as speaker expressivity. Results reveal an inventory of four primary (combinations of) prominence markers in the corpus: FA+HB+EB, FA+HB, FA only (i.e., no gesture), and HB only, implying that eyebrow beats tend to occur only in combination with the other two markers. In addition, head beats occur significantly more frequently in the second than in the first part of a news reading. A functional analysis of the data suggests that the distribution of head beats might to some degree be governed by information structure, as the text-initial clause often defines a common ground or presents the theme of the news story. In the rheme part of the news story, FA, HB, and FA+HB are all common prominence markers. The choice between them is subject to variation which we suggest might represent a degree of freedom for the speaker to use the markers expressively. A second main observation concerns eyebrow beats, which seem to be used mainly as a kind of intensification marker for highlighting not only contrast, but also value, magnitude, or emotionally loaded words; it is applicable in any position in a text. We thus observe largely different patterns of occurrence and usage of head beats on the one hand and eyebrow beats on the other, suggesting that the two represent two separate modalities of visual prominence cuing.

  • 6.
    Ambrazaitis, Gilbert
    et al.
    Linnaeus University, Växjö, Sweden.
    Frid, Johan
    Lund University Humanities Lab, Sweden.
    House, David
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Auditory vs. audiovisual prominence ratings of speech involving spontaneously produced head movements2022Inngår i: Proceedings of the 11th International Conference on Speech Prosody, Speech Prosody 2022, International Speech Communication Association , 2022, s. 352-356Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Visual information can be integrated in prominence perception, but most available evidence stems from controlled experimental settings, often involving synthetic stimuli. The present study provides evidence from spontaneously produced head gestures that occurred in Swedish television news readings. Sixteen short clips (containing 218 words in total) were rated for word prominence by 85 adult volunteers in a between-subjects design (44 in an audio-visual vs. 41 in an audio-only condition) using a web-based rating task. As an initial test of overall rating behavior, average prominence across all 218 words was compared between the two conditions, revealing no significant difference. In a second step, we compared normalized prominence ratings between the two conditions for all 218 words individually. These results displayed significant (or near significant, p<.08) differences for 28 out of 218 words, with higher ratings in either the audiovisual (13 words) or the audio-only-condition (15 words). A detailed examination revealed that the presence of head movements (previously annotated) can boost prominence ratings in the audiovisual condition, while words with low prominence tend to be rated slightly higher in the audio-only condition. The study suggests that visual prominence signals are integrated in speech processing even in a relatively uncontrolled, naturalistic setting.

  • 7.
    Ambrazaitis, Gilbert
    et al.
    Centre for Languages and Literature, Lund University, Sweden.
    House, David
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Acoustic features of multimodal prominences: Do visual beat gestures affect verbal pitch accent realization?2017Inngår i: Proceedings 14th International Conference on Auditory-Visual Speech Processing, AVSP 2017, International Speech Communication Association , 2017, s. 89-94Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The interplay of verbal and visual prominence cues has attracted recent attention, but previous findings are inconclusive as to whether and how the two modalities are integrated in the production and perception of prominence. In particular, we do not know whether the phonetic realization of pitch accents is influenced by co-speech beat gestures, and previous findings seem to generate different predictions. In this study, we investigate acoustic properties of prominent words as a function of visual beat gestures in a corpus of read news from Swedish television. The corpus was annotated for head and eyebrow beats as well as sentence-level pitch accents. Four types of prominence cues occurred particularly frequently in the corpus: (1) pitch accent only, (2) pitch accent plus head, (3) pitch accent plus head plus eyebrows, and (4) head only. The results show that (4) differs from (1-3) in terms of a smaller pitch excursion and shorter syllable duration. They also reveal significantly larger pitch excursions in (2) than in (1), suggesting that the realization of a pitch accent is to some extent influenced by the presence of visual prominence cues. Results are discussed in terms of the interaction between beat gestures and prosody with a potential functional difference between head and eyebrow beats.

  • 8.
    Ambrazaitis, Gilbert
    et al.
    Linnaeus Univ, Dept Swedish, Växjö, Sweden..
    House, David
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Probing effects of lexical prosody on speech-gesture integration in prominence production by Swedish news presenters2022Inngår i: LABORATORY PHONOLOGY, ISSN 1868-6346, Vol. 13, nr 1, s. 1-35Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This study investigates the multimodal implementation of prosodic-phonological categories, asking whether the accentual fall and the following rise in the Swedish word accents (Accent 1, Accent 2) are varied as a function of accompanying head and eyebrow gestures. Our purpose is to evaluate the hypothesis that prominence production displays a cumulative relation between acoustic and kinematic dimensions of spoken language, especially focusing on the clustering of gestures (head, eyebrows), at the same time asking if lexical-prosodic features would interfere with this cumulative relation. Our materials comprise 12 minutes of speech from Swedish television news presentations. The results reveal a significant trend for larger fo rises when a head movement accompanies the accented word, and even larger when an additional eyebrow movement is present. This trend is observed for accentual rises that encode phrase-level prominence, but not for accentual falls that are primarily related to lexical prosody. Moreover, the trend is manifested differently in different lexical-prosodic categories (Accent 1 versus Accent 2 with one versus two lexical stresses). The study provides novel support for a cumulative-cue hypothesis and the assumption that prominence production is essentially multimodal, well in line with the idea of speech and gesture as an integrated system.

  • 9.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Mapping between acoustic and articulatory gestures2011Inngår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 53, nr 4, s. 567-589Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper proposes a definition for articulatory as well as acoustic gestures along with a method to segment the measured articulatory trajectories and acoustic waveforms into gestures. Using a simultaneously recorded acoustic-articulatory database, the gestures are detected based on finding critical points in the utterance, both in the acoustic and articulatory representations. The acoustic gestures are parameterized using 2-D cepstral coefficients. The articulatory trajectories arc essentially the horizontal and vertical movements of Electromagnetic Articulography (EMA) coils placed on the tongue, jaw and lips along the midsagittal plane. The articulatory movements are parameterized using 2D-DCT using the same transformation that is applied on the acoustics. The relationship between the detected acoustic and articulatory gestures in terms of the timing as well as the shape is studied. In order to study this relationship further, acoustic-to-articulatory inversion is performed using GMM-based regression. The accuracy of predicting the articulatory trajectories from the acoustic waveforms are at par with state-of-the-art frame-based methods with dynamical constraints (with an average error of 1.45-1.55 mm for the two speakers in the database). In order to evaluate the acoustic-to-articulatory inversion in a more intuitive manner, a method based on the error in estimated critical points is suggested. Using this method, it was noted that the estimated articulatory trajectories using the acoustic-to-articulatory inversion methods were still not accurate enough to be within the perceptual tolerance of audio-visual asynchrony.

  • 10.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Neiberg, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    In search of Non-uniqueness in the Acoustic-to-Articulatory Mapping2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 2799-2802Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper explores the possibility and extent of non-uniqueness in the acoustic-to-articulatory inversion of speech, from a statistical point of view. It proposes a technique to estimate the non-uniqueness, based on finding peaks in the conditional probability function of the articulatory space. The paper corroborates the existence of non-uniqueness in a statistical sense, especially in stop consonants, nasals and fricatives. The relationship between the importance of the articulator position and non-uniqueness at each instance is also explored.

  • 11.
    Arvidsson, Klara
    et al.
    Stockholm Univ, Dept Romance Studies & Class, SE-10691 Stockholm, Sweden..
    Jemstedt, Andreas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Människocentrerad teknologi, Medieteknik och interaktionsdesign, MID.
    The Perceived Importance of Language Skills in Europe-The Case of Swedish Migrants in France2022Inngår i: Languages, E-ISSN 2226-471X, Vol. 7, nr 4, artikkel-id 290Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In a European context, where member states of the European Union share a common language policy, multilingualism and foreign language (FL) learning are strongly promoted. The goal is that citizens learn two FLs in addition to their first language(s) (L1). However, it is unclear to what extent the multilingual policy is relevant in people's lives, at a time when the English language is established as a lingua franca. This survey-based study contributes insights into the relevance of the EU multilingual policy in an intra-European migration context, by focusing on Swedish migrants (n = 199) in France, who are L1 speakers of Swedish. We investigated the perceived importance of skills in FL French, FL English, and L1 Swedish, for professional and personal life. The quantitative analyses showed that participants perceive skills in French and in English to be equally important for professional life, whereas skills in Swedish were perceived to be less important. For personal life, skills in French were perceived as the most important, followed by skills in English, and then Swedish. In conclusion, the European multilingual language policy appears to be reflected in Europeans' lives, at least in the case of Swedish migrants in France.

  • 12. Arzyutov, Dmitry
    et al.
    Lyublinskaya, Marina
    Nenet͡skoe olenevodstvo: geografii͡a, ėtnografii͡a, lingvistika [Nenets Reindeer Husbandry: Geography, Ethnography, and Linguistics].2018Collection/Antologi (Fagfellevurdert)
    Fulltekst (pdf)
    fulltext
  • 13. Auffret, Katja
    et al.
    Geslin, Teresa
    Kjellgren, Björn
    KTH, Skolan för industriell teknik och management (ITM), Lärande, Språk och kommunikation.
    Freddi, Maria
    Petroniene, Saulè
    Rinder, Jamie
    KTH, Skolan för industriell teknik och management (ITM), Lärande, Språk och kommunikation.
    Tual, David
    BADGE: Global competence for sustainable internationalisation in engineering education2021Inngår i: Languages for Specific Purposes in Higher Education 2021, 2021Konferansepaper (Fagfellevurdert)
    Abstract [en]

    BADGE: Global competence for sustainable internationalisation in engineering education

    This paper presents a new Erasmus+ funded project, Becoming a digital global engineer (BADGE2020). The project is a three-year collaboration between language and communication teachers at14 technical universities and engineering departments in 12 countries, with partners representingindustry, consultants, educational organizations and students. The rationale behind the project isthe recognition of two facts: the ever-increasing need for global competence among engineeringgraduates and professionals (Parkinson 2009, OECD 2018), and the need to boost and adjustcommunication and language for specific purposes (LSP) teaching to better support sustainableinternationalisation, acknowledging multiculturality and multilingualism.The project was initiated from within a larger network of language and communication teachers attechnical universities in Europe (GELS 2020), established in 2015 to “enhance future engineers’language skills in order to prepare them for the increasingly challenging demands of a globalisedmarket”, and is divided into 8 intellectual outputs: communication course for future engineers,sustainable writing skills for engineers, e-communication skills, global competence andentrepreneurship, podcasting and video casting architecture, soft skills for engineering students,and global competence through IT and serious games.Working in 8 transnational teams, we will develop learning material ranging from course syllabiand exercises, to handbooks and pods, to be made freely available for download and localmodification as open educational resources. Furthermore, the material will be connected to asystem of digital badges that can be used as a supplement to official degree diplomas.

    References

    BAGDE (2020) The Badge Project www.thebadgeproject.eu, accessed 2020-03-25

    GELS (2020) The GELS network www.clic.eng.cam.ac.uk/news/GELS, accessed 2020-03-25

    OECD (2018) Preparing our youth for an inclusive and sustainable world. The OECD PISAglobal competence framework www.oecd.org/education/Global-competency-for-an-inclusiveworld.pdf, accessed 2020-03-25

    Parkinson, A. (2009) “The Rationale for Developing Global Competence” Online Journal forGlobal Engineering Education: Vol. 4: Iss. 2, Article 2.digitalcommons.uri.edu/cgi/viewcontent.cgi?article=1018&context=ojgee, accessed 2020-03-25

    Note on authorship

    As equal authors and in alphabetical order: Katja Auffret (IMT Mines Albi-Carmaux, École Mines-Télécom,France), Teresa Geslin (Université de Lorraine, France), Ivana Jurković (Veleučilište u Bjelovaru, Croatia), BjörnKjellgren (KTH Royal Institute of Technology, Sweden), Freddi Maria (Università degli Studi di Pavia, Italy), SaulePetroniene (Kaunas University of Technology, Lithuania), Jamie Rinder (KTH Royal Institute of Technology,Sweden), David Tual (Cambridge University, United Kingdom).

    Fulltekst (pdf)
    fulltext
  • 14.
    Aylett, Matthew Peter
    et al.
    Heriot Watt University and CereProc Ltd. Edinburgh, UK.
    Székely, Éva
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    McMillan, Donald
    Stockholm University Stockholm, Sweden.
    Skantze, Gabriel
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Romeo, Marta
    Heriot Watt University Edinburgh, UK.
    Fischer, Joel
    University of Nottingham Nottingham, UK.
    Reyes-Cruz, Gisela
    University of Nottingham Nottingham, UK.
    Why is my Agent so Slow? Deploying Human-Like Conversational Turn-Taking2023Inngår i: HAI 2023 - Proceedings of the 11th Conference on Human-Agent Interaction, Association for Computing Machinery (ACM) , 2023, s. 490-492Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The emphasis on one-to-one speak/wait spoken conversational interaction with intelligent agents leads to long pauses between conversational turns, undermines the flow and naturalness of the interaction, and undermines the user experience. Despite ground breaking advances in the area of generating and understanding natural language with techniques such as LLMs, conversational interaction has remained relatively overlooked. In this workshop we will discuss and review the challenges, recent work and potential impact of improving conversational interaction with artificial systems. We hope to share experiences of poor human/system interaction, best practices with third party tools, and generate design guidance for the community.

  • 15.
    Bergren, Max
    et al.
    Gavagai.
    Karlgren, Jussi
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Östling, Robert
    Stockholms universitet.
    Parkvall, Mikael
    Stockholms universitet.
    Inferring the location of authors from words in their texts2015Inngår i: Proceedings of the 20th Nordic Conference of Computational Linguistics, Linköping University Electronic Press, 2015Konferansepaper (Fagfellevurdert)
    Abstract [en]

    For the purposes of computational dialec- tology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to de- termine how positionally annotated mi- croblog posts can be used to learn loca- tion indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We in- troduce the notion of placeness to describe how locational words are.

    We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating lo- cational information in a centroid for each text gives the most useful results. The re- sults are applied to data in the Swedish language. 

    Fulltekst (pdf)
    fulltext
  • 16.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Bruce, Gösta
    Lunds universitet.
    Enflo, Laura
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Schötz, Susanne
    Lunds universitet.
    Recognizing and Modelling Regional Varieties of Swedish2008Inngår i: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, 2008, s. 512-515Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Our recent work within the research project SIMULEKT (Simulating Intonational Varieties of Swedish) includes two approaches. The first involves a pilot perception test, used for detecting tendencies in human clustering of Swedish dialects. 30 Swedish listeners were asked to identify the geographical origin of Swedish native speakers by clicking on a map of Sweden. Results indicate for example that listeners from the south of Sweden are better at recognizing some major Swedish dialects than listeners from the central part of Sweden, which includes the capital area. The second approach concerns a method for modelling intonation using the newly developed SWING (Swedish INtonation Generator) tool, where annotated speech samples are resynthesized with rule based intonation and audiovisually analysed with regards to the major intonational varieties of Swedish. We consider both approaches important in our aim to test and further develop the Swedish prosody model.

  • 17.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Nordstrand, Magnus
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    A Model for Multimodal Dialogue System Output Applied to an Animated Talking Head2005Inngår i: SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE IN MOBILE ENVIRONMENTS / [ed] Minker, Wolfgang; Bühler, Dirk; Dybkjær, Laila, Dordrecht: Springer , 2005, s. 93-113Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    We present a formalism for specifying verbal and non-verbal output from a multimodal dialogue system. The output specification is XML-based and provides information about communicative functions of the output, without detailing the realisation of these functions. The aim is to let dialogue systems generate the same output for a wide variety of output devices and modalities. The formalism was developed and implemented in the multimodal spoken dialogue system AdApt. We also describe how facial gestures in the 3D-animated talking head used within this system are controlled through the formalism.

  • 18.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Visual correlates to prominence in several expressive modes2006Inngår i: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, s. 1272-1275Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systematically varied. The utterances were recorded in a variety of expressive modes including certain, confirming, questioning, uncertain, happy, angry and neutral. Results showed that in all expressive modes, words with focal accent are accompanied by a greater variation of the facial parameters than are words in non-focal positions. Moreover, interesting differences between the expressions in terms of different parameters were found.

  • 19.
    Bigert, Johnny
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Kann, Viggo
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Knutsson, Ola
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Sjöbergh, Jonas
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Grammar checking for Swedish second language learners2004Inngår i: CALL for the Nordic Languages: Tools and Methods for Computer Assisted Language Learning, Copenhagen Business School: Samfundslitteratur , 2004, s. 33-47Kapittel i bok, del av antologi (Annet vitenskapelig)
    Abstract [en]

    Grammar errors and context-sensitive spelling errors in texts written by second language learners are hard to detect automatically. We have used three different approaches for grammar checking: manually constructed error detection rules, statistical differences between correct and incorrect texts, and machine learning of specific error types. The three approaches have been evaluated using a corpus of second language learner Swedish. We found that the three methods detect different errors and therefore complement each other.

  • 20.
    Björkman, Beyza
    KTH, Skolan för teknikvetenskaplig kommunikation och lärande (ECE), Lärande, Språk och kommunikation.
    Questions in academic ELF interaction2012Inngår i: Journal of English as a Lingua Franca, ISSN 2191-9216, E-ISSN 2191-933X, Vol. 1, nr 1, s. 93-119Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This article investigates questions in a higher education setting where English is used as a lingua franca. The study originates from a larger piece of work which investigated the communicative effectiveness of spoken English as a Lingua Franca (ELF) among the teachers and students at a technical university in authentic situations (Björkman, 2010a). The focus in the present article is placed on student-student interaction from group-work sessions, but references to lectures have been included for comparison where appropriate. The questions in the study were first categorized syntactically. Syntactic analyses were followed by phonological analyses of question intonation. The results of the pilot study point to three cues the listener can rely on to be able to register an utterance as a question: syntax with specific reference to word order, utterance-final rising question intonation and the interrogative adverb/pronoun (in Wh-questions only). The results of the analyses in the present study, drawing on qualitative and quantitative data, demonstrate that a question is more likely to be registered as such when all available cues are provided for the listener. It seems reasonable to suggest, then, that the speakers in lingua franca settings, with the added complexities at the syntactic level, make use of all available cues to ensure communicative effectiveness. Most importantly, the speakers in this setting appeared to achieve communicative effectiveness by using utterance-final rising question intonation in the absence of the other cues, and not by following unmarked native speaker intonation. The fact that utterance-final question intonation is the most reliable cue among the three shows, yet from another angle in ELF usage, that we cannot assume native speaker usage as the ideal in similar settings. 

  • 21.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Tree-Based Estimation of Speaker Characteristics for Speech Recognition2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 580-583Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Speaker adaptation by means of adjustment of speaker characteristic properties, such as vocal tract length, has the important advantage compared to conventional adaptation techniques that the adapted models are guaranteed to be realistic if the description of the properties are. One problem with this approach is that the search procedure to estimate them is computationally heavy. We address the problem by using a multi-dimensional, hierarchical tree of acoustic model sets. The leaf sets are created by transforming a conventionally trained model set using leaf-specific speaker profile vectors. The model sets of non-leaf nodes are formed by merging the models of their child nodes, using a computationally efficient algorithm. During recognition, a maximum likelihood criterion is followed to traverse the tree. Studies of one- (VTLN) and four-dimensional speaker profile vectors (VTLN, two spectral slope parameters and model variance scaling) exhibit a reduction of the computational load to a fraction compared to that of an exhaustive grid search. In recognition experiments on children's connected digits using adult and male models, the one-dimensional tree search performed as well as the exhaustive search. Further reduction was achieved with four dimensions. The best recognition results are 0.93% and 10.2% WER in TIDIGITS and PF-Star-Sw, respectively, using adult models.

  • 22.
    Boholm, Max
    KTH, Skolan för arkitektur och samhällsbyggnad (ABE), Filosofi och historia, Filosofi.
    Risk, language and discourse2016Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    This doctoral thesis analyses the concept of risk and how it functions as an organizing principle of discourse, paying close attention to actual linguistic practice.

              Article 1 analyses the concepts of risk, safety and security and their relations based on corpus data (the Corpus of Contemporary American English). Lexical, grammatical and semantic contexts of the nouns risk, safety and security, and the adjectives risky, safe and secure are analysed and compared. Similarities and differences are observed, suggesting partial synonymy between safety (safe) and security (secure) and semantic opposition to risk (risky). The findings both support and contrast theoretical assumptions about these concepts in the literature.

              Article 2 analyses the concepts of risk and danger and their relation based on corpus data (in this case the British National Corpus). Frame semantics is used to explore the assumptions of the sociologist Niklas Luhmann (and others) that the risk concept presupposes decision-making, while the concept of danger does not. Findings partly support and partly contradict this assumption.

              Article 3 analyses how newspapers represent risk and causality. Two theories are used: media framing and the philosopher John Mackie’s account of causality. A central finding of the study is that risks are “framed” with respect to causality in several ways (e.g. one and the same type of risk can be presented as resulting from various causes). Furthermore, newspaper reporting on risk and causality vary in complexity. In some articles, risks are presented without causal explanations, while in other articles, risks are presented as results from complex causal conditions. Considering newspaper reporting on an aggregated overall level, complex schemas of causal explanations emerge.

              Article 4 analyses how phenomena referred to by the term nano (e.g. nanotechnology, nanoparticles and nanorobots) are represented as risks in Swedish newspaper reporting. Theoretically, the relational theory of risk and frame semantics are used. Five main groups of nano-risks are identified based on the risk object of the article: (I) nanotechnology; (II) nanotechnology and its artefacts (e.g. nanoparticles and nanomaterials); (III) nanoparticles, without referring to nanotechnology; (IV) non-nanotechnological nanoparticles (e.g. arising from traffic); and (V) nanotechnology and nanorobots. Various patterns are explored within each group, concerning, for example, what is considered to be at stake in relation to these risk objects, and under what conditions. It is concluded that Swedish patterns of newspaper reporting on nano-risks follow international trends, influenced by scientific assessment, as well as science fiction.

              Article 5 analyses the construction and negotiation of risk in the Swedish controversy over the use of antibacterial silver in health care and consumer products (e.g. sports clothes and equipment). The controversy involves several actors: print and television news media, Government and parliament, governmental agencies, municipalities, non-government organisations, and companies. In the controversy, antibacterial silver is claimed to be a risk object that negatively affects health, the environment, and sewage treatment industry (objects at risk). In contrast, such claims are denied. Antibacterial silver is even associated with the benefit of mitigating risk objects (e.g. bacteria and micro-organisms) that threaten health and the environment (objects at risk). In other words, both sides of the controversy invoke health and the environment as objects at risk. Three strategies organising risk communication are identified: (i) representation of silver as a risk to health and the environment; (ii) denial of such representations; and (iii) benefit association, where silver is construed to mitigate risks to health and the environment.

    Fulltekst (pdf)
    Thesis Introduction (Kappa)
  • 23.
    Boholm, Max
    School of Global Studies, University of Gothenburg, Gothenburg, Sweden..
    The semantic distinction between ‘risk’ and ‘danger’: A linguistic analysis2012Inngår i: Risk Analysis, ISSN 0272-4332, E-ISSN 1539-6924, Vol. 32, nr 2, s. 281-293Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The analysis combines frame semantic and corpus linguistic approaches in analyzing the role of agency and decision making in the semantics of the words “risk” and “danger” (both nominal and verbal uses). In frame semantics, the meanings of “risk” and of related words, such as “danger,” are analyzed against the background of a specific cognitive-semantic structure (a frame) comprising frame elements such as Protagonist, Bad Outcome, Decision, Possession, and Source. Empirical data derive from the British National Corpus (100 million words). Results indicate both similarities and differences in use. First, both “risk” and “danger” are commonly used to represent situations having potential negative consequences as the result of agency. Second, “risk” and “danger,” especially their verbal uses (to risk, to endanger), differ in agent-victim structure, i.e., “risk” is used to express that a person affected by an ac- tion is also the agent of the action, while “endanger” is used to express that the one affected is not the agent. Third, “risk,” but not “danger,” tends to be used to represent rational and goal-directed action. The results therefore to some extent confirm the analysis of “risk” and “danger” suggested by German sociologist Niklas Luhmann. As a point of discussion, the present findings arguably have implications for risk communication.

  • 24. Borg, Erik
    et al.
    Edquist, Gertrud
    Reinholdson, Anna-Clara
    Risberg, Arne
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    McAllister, Bob
    Speech and language development in a population of Swedish hearing-impaired pre-school-children, a cross-sectional study2007Inngår i: International Journal of Pediatric Otorhinolaryngology, ISSN 0165-5876, E-ISSN 1872-8464, Vol. 71, nr 7, s. 1061-1077Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Objective: There is little information on speech and language development in preschool children with mild, moderate or severe hearing impairment. The primary aim of the study is to establish a reference material for clinical use covering various aspects of speech and language functions and to relate test values to pure tone audiograms and parents' judgement of their children's hearing and language abilities. Methods: Nine speech and language tests were applied or modified, both classical tests and newly developed tests. Ninety-seven children with normal hearing and 156 with hearing impairment were tested. Hearing was 80 dB HL PTA or better in the best ear. Swedish was their strongest language. None had any additional diagnosed major handicaps. The children were 4-6 years of age. The material was divided into 10 categories of hearing impairment, 5 conductive and 5 sensorineural: unilateral; bilateral 0-20; 21-40; 41-60; 61-80 dB HL PTA. The tests, selected on the basis of a three component language model, are phoneme discrimination; rhyme matching; Peabody Picture Vocabulary Test (PPVT-III, word perception); Test for Reception of Grammar (TROG, grammar perception); prosodic phrase focus; rhyme construction; Word Finding Vocabulary Test (word production); Action Picture Test (grammar production); oral motor test. Results: Only categories with sensorineural toss showed significant differences from normal. Word production showed the most marked delay for 21-40 dB HL: 5 and 6 years p < 0.01; for 41-60 dB: 4 years p < 0.01 and 6 years p < 0.01 and 61-80 dB: 5 years p < 0.05. Phoneme discrimination 21-40 dB HL: 6 years p < 0.05; 41-60 dB: 4 years p < 0.01; 61-80 dB: 4 years p < 0.001, 5 years p < 0.001. Rhyme matching: no significant difference as compared to normal data. Word perception: sensorineural 41-60 dB HL: 6 years p < 0.05; 61-80 dB: 4 years p < 0.05; 5 years p < 0.01. Grammar perception: sensorineural 41-60 dB HL: 6 years p < 0.05; 61-80 dB: 5 years p < 0.05. Prosodic phrase focus: 41-60 dB HL: 5 years p < 0.01. Rhyme construction: 41-60 dB HL: 4 years p < 0.05. Grammar production: 61-80 dB HL: 5 years p < 0.01. Oral motor function: no differences. The Word production test showed a 1.5-2 years delay for sensorineural impairment 41-80 dB HL through 4-6 years of age. There were no differences between hearing-impaired boys and girls. Extended data for the screening test [E. Borg, A. Risberg, B. McAllister, B.M. Undemar, G. Edquist, A.C. Reinholdsson, et at., Language development in hearing-impaired children. Establishment of a reference material for a "Language test for hearing-impaired children", Int. J. Pediatr. Otorhinolaryngot. 65 (2002) 15-26] are presented. Conclusions: Reference values for expected speech and language development are presented that cover nearly 60% of the studied population. The effect of the peripheral hearing impairment is compensated for in many children with hearing impairment up to 60 dB HL. Above that degree of impairment, language delay is more pronounced, probably due to a toss of acuity. The importance of central cognitive functions, speech reading and signing for compensation of peripheral limitations is pointed out.

  • 25.
    Bottomley, Jane
    et al.
    KTH, Skolan för industriell teknik och management (ITM), Lärande, Språk och kommunikation.
    Rinder, Jamie
    KTH, Skolan för industriell teknik och management (ITM), Lärande, Språk och kommunikation.
    Zeitler Lyne, Susanna
    KTH, Skolan för industriell teknik och management (ITM), Lärande, Språk och kommunikation.
    The KTH guide to scientific writing: Sparking a conversation about writing2023Inngår i: 19th International CDIO Conference, Engineering education for a smart, safe and sustainable future, NTNU, Trondheim, Norway, Chalmers University of Technology , 2023, s. 208-217Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The KTH Guide to scientific writing was created with the aim of supporting students and faculty with scientific writing in English. The guide is rooted in the typical writing genres of a technical university, and draws on examples of these to explore sentence structure, punctuation, text flow, and scientific style. Since its launch, the guide has become an integral part of classroom practice in the department of Language and Communication, and an online resource for all students and faculty at KTH. This paper presents our findings from the first stage of our evaluation of the guide. The evaluation consists of a short reflective questionnaire for users. We have begun to collect responses to the questions, and to conduct an inductive thematic analysis (ITA) to identify emerging themes. 

  • 26.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strangert, Eva
    Cues for Hesitation in Speech Synthesis2006Inngår i: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, s. 1300-1303Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The current study investigates acoustic correlates to perceived hesitation based on previous work showing that pause duration and final lengthening both contribute to the perception of hesitation. It is the total duration increase that is the valid cue rather than the contribution by either factor. The present experiment using speech synthesis was designed to evaluate F0 slope and presence vs. absence of creaky voice before the inserted hesitation in addition to durational cues. The manipulations occurred in two syntactic positions, within a phrase and between two phrases, respectively. The results showed that in addition to durational increase, variation of both F0 slope and creaky voice had perceptual effects, although to a much lesser degree. The results have a bearing on efforts to model spontaneous speech including disfluencies, to be explored, for example, in spoken dialogue systems.

  • 27.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Hirschberg, Julia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Cross-Cultural Perception of Discourse Phenomena2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 1723-1726Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We discuss perception studies of two low level indicators of discourse phenomena by Swedish. Japanese, and Chinese native speakers. Subjects were asked to identify upcoming prosodic boundaries and disfluencies in Swedish spontaneous speech. We hypothesize that speakers of prosodically unrelated languages should be less able to predict upcoming phrase boundaries but potentially better able to identify disfluencies, since indicators of disfluency are more likely to depend upon lexical, as well as acoustic information. However, surprisingly, we found that both phenomena were fairly well recognized by native and non-native speakers, with, however, some possible interference from word tones for the Chinese subjects.

  • 28. Dahl, S.
    et al.
    Bevilacqua, F.
    Bresin, Roberto
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Clayton, M.
    Leante, L.
    Poggi, I.
    Rasamimanana, N.
    Gestures in Performance2010Inngår i: Musical Gestures: Sound, Movement, and Meaning, Taylor and Francis , 2010, s. 36-68Kapittel i bok, del av antologi (Annet vitenskapelig)
    Abstract [en]

    On occasion, one can observe a whole orchestra section moving and playing in unison. In such an instance, all violinists play the melody using the same type of bowing movements and lean forward in unison at a given time during a specific passage in the music. Thus, not only do the musicians use very similar movements to produce the same notes but they sometimes also coordinate bodily sways or other movements with the other players. The musical gesture seems to manifest itself in both sound and movement. Whether we are watching as audience, or participating and interacting in the actual performance itself, we receive a considerable amount of gestural information. The aim of this chapter is to give examples of gestures that may be observed during performance, and to consider the kind of information they might convey, either to other performers or to the audience. 

  • 29.
    Dahlberg, Leif
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    Det akademiska samtalet2015Inngår i: Universitetet som medium / [ed] Matts Lindström & Adam Wickberg Månsson, Lund: Mediehistoria, Lunds universitet , 2015, s. 195-223Kapittel i bok, del av antologi (Fagfellevurdert)
    Fulltekst (pdf)
    Det akademiska samtalet
  • 30.
    David Lopes, José
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Hemmingsson, Nils
    KTH.
    Åstrand, Oliver
    KTH.
    The Spot the Difference corpus: A multi-modal corpus of spontaneous task oriented spoken interactions2019Inngår i: LREC 2018 - 11th International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA) , 2019, s. 1939-1945Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes the Spot the Difference Corpus which contains 54 interactions between pairs of subjects interacting to find differences in two very similar scenes. The setup used, the participants' metadata and details about collection are described. We are releasing this corpus of task-oriented spontaneous dialogues. This release includes rich transcriptions, annotations, audio and video. We believe that this dataset constitutes a valuable resource to study several dimensions of human communication that go from turn-taking to the study of referring expressions. In our preliminary analyses we have looked at task success (how many differences were found out of the total number of differences) and how it evolves over time. In addition we have looked at scene complexity provided by the RGB components' entropy and how it could relate to speech overlaps, interruptions and the expression of uncertainty. We found there is a tendency that more complex scenes have more competitive interruptions.

  • 31. de Leeuw, Esther
    et al.
    Opitz, Conny
    Lubinska, Dorota
    KTH, Skolan för teknikvetenskaplig kommunikation och lärande (ECE), Lärande, Språk och kommunikation.
    Dynamics of first language attrition across the lifespan Introduction2013Inngår i: International Journal of Bilingualism, ISSN 1367-0069, E-ISSN 1756-6878, Vol. 17, nr 6, s. 667-674Artikkel i tidsskrift (Fagfellevurdert)
  • 32.
    De Rosa, Francesca
    et al.
    Ctr Adv Pathogen Threat & Response Simulat, Austin, TX 78701 USA..
    Baalsrud Hauge, Jannicke
    KTH, Skolan för industriell teknik och management (ITM), Hållbar produktionsutveckling (ML), Avancerad underhållsteknik och produktionslogistik.
    Dondio, Pierpaolo
    TU Dublin, Sch Comp Sci, Dublin, Ireland..
    Marfisi-Schottman, Iza
    Univ Mans, Le Mans, France..
    Romero, Margarida
    Univ Cote dAzur, Nice, France..
    Bellotti, Francesco
    Univ Genoa, Genoa, Italy..
    Introduction to the Special Issue on GaLA Conf 20212022Inngår i: INTERNATIONAL JOURNAL OF SERIOUS GAMES, E-ISSN 2384-8766, Vol. 9, nr 3, s. 3-4Artikkel i tidsskrift (Annet vitenskapelig)
  • 33.
    De Wit, Jan
    et al.
    Department of Communication and Cognition, Tilburg University, Tilburg, the Netherlands.
    Willemsen, Bram
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    De Haas, Mirjam
    Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, the Netherlands.
    Van Den Berghe, Rianne
    Department of Development of Youth and Education in Diverse Societies, Utrecht University, Utrecht, the Netherlands; Section Leadership in Education and Development, Windesheim University of Applied Sciences, Almere, the Netherlands.
    Leseman, Paul
    Department of Development of Youth and Education in Diverse Societies, Utrecht University, Utrecht, the Netherlands.
    Oudgenoeg-Paz, Ora
    Department of Development of Youth and Education in Diverse Societies, Utrecht University, Utrecht, the Netherlands.
    Verhagen, Josje
    Amsterdam Center for Language and Communication, University of Amsterdam, Amsterdam, the Netherlands.
    Vogt, Paul
    School of Communication, Media & IT, Hanze University of Applied Sciences, Groningen, the Netherlands.
    Krahmer, Emiel
    Department of Communication and Cognition, Tilburg University, Tilburg, the Netherlands.
    Designing and Evaluating Iconic Gestures for Child-Robot Second Language Learning2021Inngår i: Interacting with computers, ISSN 0953-5438, E-ISSN 1873-7951, Vol. 33, nr 6, s. 596-626Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this paper, we examine the process of designing robot-performed iconic hand gestures in the context of a long-Term study into second language tutoring with children of approximately 5 years old. We explore four factors that may relate to their efficacy in supporting second language tutoring: The age of participating children; differences between gestures for various semantic categories, e.g. measurement words, such as small, versus counting words, such as five; the quality (comprehensibility) of the robot's gestures; and spontaneous reenactment or imitation of the gestures. Age was found to relate to children's learning outcomes, with older children benefiting more from the robot's iconic gestures than younger children, particularly for measurement words. We found no conclusive evidence that the quality of the gestures or spontaneous reenactment of said gestures related to learning outcomes. We further propose several improvements to the process of designing and implementing a robot's iconic gesture repertoire.

  • 34.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Pushy versus meek: using avatars to influence turn-taking behaviour2007Inngår i: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, s. 2784-2787Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The flow of spoken interaction between human interlocutors is a widely studied topic. Amongst other things, studies have shown that we use a number of facial gestures to improve this flow - for example to control the taking of turns. This type of gestures ought to be useful in systems where an animated talking head is used, be they systems for computer mediated human-human dialogue or spoken dialogue systems, where the computer itself uses speech to interact with users. In this article, we show that a small set of simple interaction control gestures and a simple model of interaction can be used to influence users' behaviour in an unobtrusive manner. The results imply that such a model may improve the flow of computer mediated interaction between humans under adverse circumstances, such as network latency, or to create more human-like spoken human-computer interaction.

  • 35.
    Edlund, Jens
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Brodén, D.
    Fridlund, M.
    Lindhé, C.
    Olsson, L. -J
    Ängsal, M.P.
    Öhberg, P.
    A Multimodal Digital Humanities Study of Terrorism in Swedish Politics: An Interdisciplinary Mixed Methods Project on the Configuration of Terrorism in Parliamentary Debates, Legislation, and Policy Networks 1968–20182022Inngår i: Lecture Notes in Networks and Systems, Springer Nature , 2022, Vol. 295, s. 435-449Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents the design of one of Sweden’s largest digital humanities projects, SweTerror, that through an interdisciplinary multi-modal methodological approach develops an extensive speech-to-text digital HSS resource. SweTerror makes a major contribution to the study of terrorism in Sweden through a comprehensive mixed methods study of the political discourse on terrorism since the late 1960s. Drawing on artificial intelligence in the form of state-of-the-art language and speech technology, it systematically analyses all forms of relevant parliamentary utterances. It explores and curates an exhaustive but understudied multi-modal collection of primary sources of central relevance to Swedish democracy: the audio recordings of the Swedish Parliament’s debates. The project studies the framing of terrorism both as policy discourse and enacted politics, examining semantic and emotive components of the parliamentary discourse on terrorism as well as major actors and social networks involved. It covers political responses to a range of terrorism-related issues as well as factors influencing policy-makers’ engagement, including political affiliations and gender. SweTerror also develops an online research portal, featuring the complete research material and searchable audio made readily accessible for further exploration. Long-term, the project establishes a model for combining extraction technologies (speech recognition and analysis) for audiovisual parliamentary data with text mining and HSS interpretive methods and the portal is designed to serve as a prototype for other similar projects.

  • 36.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives2016Inngår i: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, European Language Resources Association (ELRA) , 2016, s. 4531-4534Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In 2014, the Swedish government tasked a Swedish agency, The Swedish Post and Telecom Authority (PTS), with investigating how to best create and populate an infrastructure for spoken language resources (Ref N2014/2840/ITP). As a part of this work, the department of Speech, Music and Hearing at KTH Royal Institute of Technology have taken inventory of existing potential spoken language resources, mainly in Swedish national archives and other governmental or public institutions. In this position paper, key priorities, perspectives, and strategies that may be of general, rather than Swedish, interest are presented. We discuss broad types of potential spoken language resources available; to what extent these resources are free to use; and thirdly the main contribution: strategies to ensure the continuous acquisition of spoken language resources in a manner that facilitates speech and speech technology research.

  • 37.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Manias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hirschberg, Julia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Pause and gap length in face-to-face interaction2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 2779-2782Konferansepaper (Fagfellevurdert)
    Abstract [en]

    It has long been noted that conversational partners tend to exhibit increasingly similar pitch, intensity, and timing behavior over the course of a conversation. However, the metrics developed to measure this similarity to date have generally failed to capture the dynamic temporal aspects of this process. In this paper, we propose new approaches to measuring interlocutor similarity in spoken dialogue. We define similarity in terms of convergence and synchrony and propose approaches to capture these, illustrating our techniques on gap and pause production in Swedish spontaneous dialogues.

  • 38.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    vertical bar nailon vertical bar: Software for Online Analysis of Prosody2006Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents /nailon/ - a software package for online real-time prosodic analysis that captures a number of prosodic features relevant for inter-action control in spoken dialogue systems. The current implementation captures silence durations; voicing, intensity, and pitch; pseudo-syllable durations; and intonation patterns. The paper provides detailed information on how this is achieved. As an example application of /nailon/, we demonstrate how it is used to improve the efficiency of identifying relevant places at which a machine can legitimately begin to talk to a human interlocutor, as well as to shorten system response times.

  • 39.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Heldner, Mattias
    Wlodarczak, Marcin
    Catching wind of multiparty conversation2014Inngår i: LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The paper describes the design of a novel corpus of respiratory activity in spontaneous multiparty face-to-face conversations in Swedish. The corpus is collected with the primary goal of investigating the role of breathing for interactive control of interaction. Physiological correlates of breathing are captured by means of respiratory belts, which measure changes in cross sectional area of the rib cage and the abdomen. Additionally, auditory and visual cues of breathing are recorded in parallel to the actual conversations. The corpus allows studying respiratory mechanisms underlying organisation of spontaneous communication, especially in connection with turn management. As such, it is a valuable resource both for fundamental research and speech techonology applications.

  • 40.
    Ekstedt, Erik
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Wang, Siyang
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Székely, Éva
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Gustafsson, Joakim
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Skantze, Gabriel
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis2023Inngår i: Interspeech 2023, International Speech Communication Association , 2023, s. 5481-5485Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues. Using the recently proposed Voice Activity Projection model, we propose an automatic evaluation approach to measure these aspects for conversational speech synthesis. We investigate the ability of three commercial, and two open-source, Text-To-Speech (TTS) systems ability to generate turn-taking cues over simulated turns. By varying the stimuli, or controlling the prosody, we analyze the models performances. We show that while commercial TTS largely provide appropriate cues, they often produce ambiguous signals, and that further improvements are possible. TTS, trained on read or spontaneous speech, produce strong turn-hold but weak turn-yield cues. We argue that this approach, that focus on functional aspects of interaction, provides a useful addition to other important speech metrics, such as intelligibility and naturalness.

  • 41.
    Ekström, Axel G.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    A Theory That Never Was: Wrong Way to the “Dawn of Speech”2024Inngår i: Biolinguistics, ISSN 1450-3417, Vol. 18, artikkel-id e14285Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Recent literature argues that a purportedly long-standing theory—so-called “laryngeal descent theory”—in speech evolution has been refuted (Boë et al., 2019, https://doi.org/10.1126/sciadv.aaw3916). However, an investigation into the relevant source material reveals that the theory described has never been a prominent line of thinking in speech-centric sciences. The confusion arises from a fundamental misunderstanding: the argument that the descent of the larynx and the accompanying changes in the hominin vocal tract expanded the range of possible speech sounds for human ancestors (a theory that enjoys wide interdisciplinary support) is mistakenly interpreted as a belief that all speech was impossible without such changes—a notion that was never widely endorsed in relevant literature. This work aims not to stir controversy but to highlight important historical context in the study of speech evolution.

  • 42.
    Ekström, Axel G.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Motor constellation theory: A model of infants’ phonological development2022Inngår i: Frontiers in Psychology, E-ISSN 1664-1078, Vol. 13, artikkel-id 996894Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Every normally developing human infant solves the difficult problem of mapping their native-language phonology, but the neural mechanisms underpinning this behavior remain poorly understood. Here, motor constellation theory, an integrative neurophonological model, is presented, with the goal of explicating this issue. It is assumed that infants' motor-auditory phonological mapping takes place through infants' orosensory "reaching" for phonological elements observed in the language-specific ambient phonology, via reference to kinesthetic feedback from motor systems (e.g., articulators), and auditory feedback from resulting speech and speech-like sounds. Attempts are regulated by basal ganglion-cerebellar speech neural circuitry, and successful attempts at reproduction are enforced through dopaminergic signaling. Early in life, the pace of anatomical development constrains mapping such that complete language-specific phonological mapping is prohibited by infants' undeveloped supralaryngeal vocal tract and undescended larynx; constraints gradually dissolve with age, enabling adult phonology. Where appropriate, reference is made to findings from animal and clinical models. Some implications for future modeling and simulation efforts, as well as clinical settings, are also discussed.

    Fulltekst (pdf)
    fulltext
  • 43.
    Ekström, Axel G.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Viki’s First Words: A Comparative Phonetics Case Study2023Inngår i: International journal of primatology, ISSN 0164-0291, E-ISSN 1573-8604, Vol. 44, nr 2, s. 249-253Artikkel i tidsskrift (Fagfellevurdert)
  • 44.
    Ekström, Axel G.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    What's next for size-sound symbolism?2022Inngår i: Frontiers in Language Sciences, E-ISSN 2813-4605, Vol. 1Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This text reviews recent research in phonetic size-sound symbolism – non-arbitrary attributions of size properties to speech acoustic properties. Evidence from a wide range of research works is surveyed, and recent findings from research on the relationships between fundamental frequency, vowel articulation, consonant articulation, phonation type, mora count, and phonemic position, are discussed. It is argued that a satisfactory explanatory model of phonetic size-sound symbolism should meet two criteria: they should be able to explain both (1) the relationship between size and speech acoustics (Association criterion), and (2) the inconsistent findings observed across languages in the relevant literature (the Inconsistency criterion). Five theories are briefly discussed: The frequency code, Embodied cognition, Sound-meaning bootstrapping, Sapir-Whorf hypotheses, and Stochastic drift. It is contended that no currently available explanatory model of size-sound symbolism adequately meets both criteria (1) and (2), but that a combination of perspectives may provide much of the necessary depth. Future directions are also discussed.

     

    Fulltekst (pdf)
    fulltext
  • 45.
    Ekström, Axel G.
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Nirme, Jens
    Lund Univ, Lund Univ Cognit Sci, Dept Philosophy, Lund, Sweden..
    Gardenfors, Peter
    Lund Univ, Lund Univ Cognit Sci, Dept Philosophy, Lund, Sweden.;Univ Johannesburg, Paleores Inst, Johannesburg, South Africa..
    Motion iconicity in prosody2022Inngår i: Frontiers in Communication, E-ISSN 2297-900X, Vol. 7, artikkel-id 994162Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Evidence suggests that human non-verbal speech may be rich in iconicity. Here, we report results from two experiments aimed at testing whether perception of increasing and declining f(0) can be iconically mapped onto motion events. We presented a sample of mixed-nationality participants (N = 118) with sets of two videos, where one pictured upward movement and the other downward movement. A disyllabic non-sense word prosodically resynthesized as increasing or declining in f(0) was presented simultaneously with each video in a pair, and participants were tasked with guessing which of the two videos the word described. Results indicate that prosody is iconically associated with motion, such that motion-prosody congruent pairings were more readily selected than incongruent pairings (p < 0.033). However, the effect observed in our sample was primarily driven by selections of words with declining f(0). A follow-up experiment with native Turkish speaking participants (N = 92) tested for the effect of language-specific metaphor for auditory pitch. Results showed no significant association between prosody and motion. Limitations of the experiment, and some implications for the motor theory of speech perception, and "gestural origins" theories of language evolution, are discussed.

  • 46.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Can audio-visual instructions help learners improve their articulation?: an ultrasound study of short term changes2008Inngår i: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008, s. 2631-2634Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes how seven French subjects change their pronunciation and articulation when practising Swedish words with a computer-animated virtual teacher. The teacher gives feedback on the user's pronunciation with audiovisual instructions suggesting how the articulation should be changed. A wizard-of-Oz set-up was used for the training session, in which a human listener choose the adequate pre-generated feedback based on the user's pronunciation. The subjects change of the articulation was monitored during the practice session with a hand-held ultrasound probe. The perceptual analysis indicates that the subjects improved their pronunciation during the training and the ultrasound measurements suggest that the improvement was made by following the articulatory instructions given by the computer-animated teacher.

  • 47.
    Engwall, Olov
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Dynamical Aspects of Coarticulation in Swedish Fricatives: A Combined EMA and EPG Study2000Inngår i: TMH Quarterly Status and Progress Report, s. 49-73Artikkel i tidsskrift (Annet vitenskapelig)
    Abstract [en]

    An electromagnetic articulography (EMA) system and electropalatography (EPG)have been employed to study five Swedish fricatives in different vowel contexts.Articulatory measures at the onset of, the mean value during, and at the offset ofthe fricative were used to evidence the coarticulation throughout the fricative. Thecontextual influence on these three different measurements of the fricative arecompared and contrasted to evidence how the coarticulation changes. Measureswere made for the jaw motion, lip protrusion, tongue body with EMA and linguopalatalcontact with EPG. The data from the two sources were further combinedand assessed for complementary and conflicting results.

  • 48.
    Engwall, Olov
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Lopes, José
    Herriot-Watt University.
    Cumbal, Ronald
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Berndtson, Gustav
    KTH.
    Lindström, Ruben
    KTH.
    Ekman, Patrik
    KTH.
    Hartmanis, Eric
    KTH.
    Jin, Emelie
    KTH.
    Johnston, Ella
    KTH.
    Tahir, Gara
    KTH.
    Mekonnen, Michael
    KTH.
    Learner and teacher perspectives on robot-led L2 conversation practiceManuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    This article focuses on designing and evaluating conversation practice in a second language (L2) with a robot that employs human spoken and non-verbal interaction strategies. Based on an analysis of previous work and semi-structured interviews with L2 learners and teachers, recommendations for robot-led conversation practice for adult learners at intermediate level are first defined, focused on language learning, on the social context, on the conversational structure and on verbal and visual aspects of the robot moderation. Guided by these recommendations, an experiment is set up, in which 12 pairs of L2 learners of Swedish interact with a robot in short social conversations. These robot-learner interactions are evaluated through post-session interviews with the learners, teachers’ ratings of the robot’s behaviour and analyses of the video-recorded conversations, resulting in a set of guidelines for robot-led conversation practice, in particular: 1) Societal and personal topics increase the practice’s meaningfulness for learners. 2) Strategies and methods for providing corrective feedback during conversation practice need to be explored further. 3) Learners should be encouraged to support each other if the robot has difficulties adapting to their linguistic level. 4) The robot should establish a social relationship, by contributing with its own story, remembering the participants’ input, and making use of non-verbal communication signals. 5) Improvements are required regarding naturalness and intelligibility of text-to-speech synthesis, in particular its speed, if it is to be used for conversations with L2 learners.

    Fulltekst (pdf)
    fulltext
  • 49.
    Engwall, Olov
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Are real tongue movements easier to speech read than synthesized?2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 824-827Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Speech perception studies with augmented reality displays in talking heads have shown that tongue reading abilities are weak initially, but that subjects become able to extract some information from intra-oral visualizations after a short training session. In this study, we investigate how the nature of the tongue movements influences the results, by comparing synthetic rule-based and actual, measured movements. The subjects were significantly better at perceiving sentences accompanied by real movements, indicating that the current coarticulation model developed for facial movements is not optimal for the tongue.

  • 50.
    Ericsdotter, Christine
    et al.
    Stockholm University.
    Ternström, Sten
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Swedish2012Inngår i: The Use of the International Phonetic Alphabet in the Choral Rehearsal / [ed] Duane R. Karna, Scarecrow Press, 2012, s. 245-251Kapittel i bok, del av antologi (Annet vitenskapelig)
123 1 - 50 of 145
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf