Ändra sökning
Avgränsa sökresultatet
1234567 51 - 100 av 692
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 51.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Detecting confusable phoneme pairs for Swedish language learners depending on their first language2011Ingår i: TMH-QPSR, ISSN 1104-5787, Vol. 51, nr 1, s. 89-92Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    This paper proposes a paradigm where commonly made segmental pronunciation errors are modeled as pair-wise confusions between two or more phonemes in the language that is being learnt. The method uses an ensemble of support vector machine classifiers with time varying Mel frequency cepstral features to distinguish between several pairs of phonemes. These classifiers are then applied to classify the phonemes uttered by second language learners. Using this method, an assessment is made regarding the typical pronunciation problems that students learning Swedish would encounter, depending on their first language.

  • 52.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Abdou, Sherif
    Faculty of Computers & Information, Cairo University, Egypt.
    Using an Ensemble of Classifiers for Mispronunciation Feedback2011Ingår i: Proceedings of SLaTE / [ed] Strik, H.; Delmonte, R.; Russel, M., Venice, Italy, 2011Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper proposes a paradigm where commonly made segmental pronunciation errors are modeled as pair-wise confusions between two or more phonemes in the language that is being learnt. The method uses an ensemble of support vector machine classifiers with time varying Mel frequency cepstral features to distinguish between several pairs of phonemes. These classifiers are then applied to classify the phonemes uttered by second language learners. Instead of providing feedback at every mispronounced phoneme, the method attempts toprovide feedback about typical mispronunciations by a certain student, over an entire session of several utterances. Two case studies that demonstrate how the paradigm is applied to provide suitable feedback to two students is also described in this pape

  • 53.
    Arango-Alegría, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Cuadernillo sobre América Central2005 (uppl. 1)Bok (Refereegranskat)
  • 54.
    Arango-Alegría, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Cuadernillo sobre México2005Bok (Refereegranskat)
  • 55.
    Argaw, Atelach Alemu
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Data- och systemvetenskap, DSV.
    Asker, Lars
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Data- och systemvetenskap, DSV.
    Amharic-English information retrieval2006Ingår i: CLEF2006 Working Notes, CEUR-WS , 2006Konferensbidrag (Refereegranskat)
    Abstract [en]

    We describe Amharic-English cross lingual information retrieval experiments in the adhoc bilingual tracs of the CLEF 2006. The query analysis is supported by morphological analysis and part of speech tagging while we used different machine readable dictionaries for term lookup in the translation process. Out of dictionary terms were handled using fuzzy matching and Lucene[4] was used for indexing and searching. Four experiments that differed in terms of utilized fields in the topic set, fuzzy matching, and term weighting, were conducted. The results obtained are reported and discussed.

  • 56.
    Arnela, Marc
    et al.
    GTM–Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins 30, Barcelona, E-08022, Catalonia, Spain.
    Blandin, Rémi
    GIPSA-Lab, Unité Mixte de Recherche au Centre National de la Recherche Scientifique 5216, Grenoble Campus, St. Martin d'Heres, F-38402, France.
    Dabbaghchian, Saeed
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Guasch, Oriol
    GTM–Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins 30, Barcelona, E-08022, Catalonia, Spain.
    Alías, Francesc
    GTM–Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins 30, Barcelona, E-08022, Catalonia, Spain.
    Pelorson, Xavier
    GIPSA-Lab, Unité Mixte de Recherche au Centre National de la Recherche Scientifique 5216, Grenoble Campus, St. Martin d'Heres, F-38402, France.
    Van Hirtum, Annemie
    GIPSA-Lab, Unité Mixte de Recherche au Centre National de la Recherche Scientifique 5216, Grenoble Campus, St. Martin d'Heres, F-38402, France.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Influence of lips on the production of vowels based on finite element simulations and experiments2016Ingår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 139, nr 5, s. 2852-2859Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Three-dimensional (3-D) numerical approaches for voice production are currently being investigated and developed. Radiation losses produced when sound waves emanate from the mouth aperture are one of the key aspects to be modeled. When doing so, the lips are usually removed from the vocal tract geometry in order to impose a radiation impedance on a closed cross-section, which speeds up the numerical simulations compared to free-field radiation solutions. However, lips may play a significant role. In this work, the lips' effects on vowel sounds are investigated by using 3-D vocal tract geometries generated from magnetic resonance imaging. To this aim, two configurations for the vocal tract exit are considered: with lips and without lips. The acoustic behavior of each is analyzed and compared by means of time-domain finite element simulations that allow free-field wave propagation and experiments performed using 3-D-printed mechanical replicas. The results show that the lips should be included in order to correctly model vocal tract acoustics not only at high frequencies, as commonly accepted, but also in the low frequency range below 4 kHz, where plane wave propagation occurs.

  • 57. Arnela, Marc
    et al.
    Dabbaghchian, Saeed
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Blandin, Rémi
    Guasch, Oriol
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Pelorson, Xavier
    Van Hirtum, Annemie
    Effects of vocal tract geometry simplifications on the numerical simulation of vowels2015Ingår i: PAN EUROPEAN VOICE CONFERENCE ABSTRACT BOOK: Proceedings e report 104, Firenze University Press, 2015, s. 177-Konferensbidrag (Övrigt vetenskapligt)
  • 58.
    Arnela, Marc
    et al.
    GTM Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Barcelona, Spain.
    Dabbaghchian, Saeed
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Guasch, Oriol
    GTM Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Barcelona, Spain.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    A semi-polar grid strategy for the three-dimensional finite element simulation of vowel-vowel sequences2017Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, The International Speech Communication Association (ISCA), 2017, Vol. 2017, s. 3477-3481Konferensbidrag (Refereegranskat)
    Abstract [en]

    Three-dimensional computational acoustic models need very detailed 3D vocal tract geometries to generate high quality sounds. Static geometries can be obtained from Magnetic Resonance Imaging (MRI), but it is not currently possible to capture dynamic MRI-based geometries with sufficient spatial and time resolution. One possible solution consists in interpolating between static geometries, but this is a complex task. We instead propose herein to use a semi-polar grid to extract 2D cross-sections from the static 3D geometries, and then interpolate them to obtain the vocal tract dynamics. Other approaches such as the adaptive grid have also been explored. In this method, cross-sections are defined perpendicular to the vocal tract midline, as typically done in 1D to obtain the vocal tract area functions. However, intersections between adjacent cross-sections may occur during the interpolation process, especially when the vocal tract midline quickly changes its orientation. In contrast, the semi-polar grid prevents these intersections because the plane orientations are fixed over time. Finite element simulations of static vowels are first conducted, showing that 3D acoustic wave propagation is not significantly altered when the semi-polar grid is used instead of the adaptive grid. The vowel-vowel sequence [ɑi] is finally simulated to demonstrate the method.

  • 59. Artman, H.
    et al.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hultén, M.
    Karlgren, K.
    Ramberg, R.
    The Interactionary as a didactic format in design education2015Ingår i: Proc. of KTH Scholarship of Teaching and Learning 2015, Stockholm, Sweden, 2015Konferensbidrag (Refereegranskat)
  • 60.
    Askenfelt, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Between the frog and the tip: Bowing gestures and bow-string interaction in violin playing (invited)2008Ingår i: Program abstracts for Acoustics‘08 Paris, Acoustical Society of America (ASA), 2008, s. 3656-Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    The motion of the bow gives a natural visualization of a string performance. Watching the player's bowing may augment the communicative power of the music, but all relevant bow control parameters are not easy to capture by a spectator. The string player controls volume of sound and tone quality continuously by coordination of three basic bowing parameters (bow velocity, bow‐bridge distance, and bow force), which set the main conditions for the bow‐string interaction. At a more detailed level of description, the tilting of the bow, which among other things controls the effective width of the bow hair, enters into the model. On a longer time scale, pre‐planned coordination schemes ('bowing gestures'), including the basic bowing parameters and the angles between the path of the bow and the strings, builds the performance. Systems for recording bowing parameters will be reviewed and results from old and current studies on bowing gestures presented. The player's choice and coordination of bowing parameters are constrained both in attacks and 'steady‐state' according to bow‐string interaction models. Recent verifications of these control spaces will be examined. Strategies for starting notes and examples of how players do in practice will be presented and compared with listeners' preferences.

  • 61.
    Askenfelt, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Double Bass2010Ingår i: The Science of String Instruments / [ed] Rossing, T., Springer-Verlag New York, 2010, s. 259-277Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    The study of the acoustics of bowed instruments has for several reasons focused on the violin. A substantial amount of knowledge has been accumulated over the last century (see Hutchins 1975, 1976; Hutchins and Benade 1997). The violin is discussed in Chap. 13, while the cello is discussed in Chap. 14. The bow is discussed in Chap. 16.

  • 62.
    Askenfelt, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Hansen, Kjetil Falkenberg
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Granqvist, Svante
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Hellmer, Kahl
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Orlarey, Y.
    Fober, D.
    Perifanos, K.
    Tambouratzis, G.
    Makropoulo, E.
    Chryssafidou, E.
    Arnaikos, L.
    Rattasepp, K.
    Dima, G.
    VEMUS, Virtual European Music School or A young person's interactive guide to making music2008Ingår i: Proceedings of the 28th ISME World Conference, 2008, s. 218-Konferensbidrag (Refereegranskat)
  • 63. Batliner, A.
    et al.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    D’Arcy, S.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Giuliani, D.
    Gerosa, M.
    Hacker, C.
    Russell, M.
    Steidl, S.
    Wong, M.
    The PF STAR Children’s Speech Corpus2005Ingår i: 9th European Conference on Speech Communication and Technology, 2005, s. 3761-3764Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes the corpus of recordings of children's speech which was collected as part of the EU FP5 PF_STAR project. The corpus contains more than 60 hours of speech, including read and imitated native-language speech in British English, German and Swedish, read and imitated non-native-language English speech from German, Italian and Swedish children, and native-language spontaneous and emotional speech in English and German.

  • 64. Bell, Linda
    et al.
    Boye, Johan
    Gustafson, Joakim
    TeliaSonera.
    Heldner, Mattias
    TeliaSonera.
    Lindström, Anders
    Wirén, Mats
    The Swedish NICE Corpus: Spoken dialogues between children and embodied characters in a computer game scenario2005Ingår i: 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 2005, s. 2765-2768Konferensbidrag (Refereegranskat)
    Abstract [en]

    This article describes the collection and analysis of a Swedish database of spontaneous and unconstrained children-machine dialogues. The Swedish NICE corpus consists ofspoken dialogues between children aged 8 to 15 and embodied fairy-tale characters in acomputer game scenario. Compared to previously collected corpora of children'scomputer-directed speech, the Swedish NICE corpus contains extended interactions, including three-party conversation, in which the young users used spoken dialogue asthe primary means of progression in the game.

  • 65.
    Bell, Linda
    et al.
    TeliaSonera R and D, Sweden.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Children’s convergence in referring expressions to graphical objects in a speech-enabled computer game2007Ingår i: 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, 2007, s. 2788-2791Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes an empirical study of children's spontaneous interactions with an animated character in a speech-enabled computer game. More specifically, it deals with convergence of referring expressions. 49 children were invited to play the game, which was initiated by a collaborative "put-that-there" task. In order to solve this task, the children had to refer to both physical objects and icons in a 3D environment. For physical objects, which were mostly referred to using straight-forward noun phrases, lexical convergence took place in 90% of all cases. In the case of the icons, the children were more innovative and spontaneously referred to them in many different ways. Even after being prompted by the system, lexical convergence took place for only 50% of the icons. In the cases where convergence did take place, the effect of the system's prompts were quite local, and the children quickly resorted to their original way of referring when naming new icons in later tasks.

  • 66. Bellec, G.
    et al.
    Elowsson, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Wolff, D.
    Weyde, T.
    A social network integrated game experiment to relate tapping to speed perception and explore rhythm reproduction2013Ingår i: Proceedings of the Sound and Music Computing Conference 2013, 2013, s. 19-26Konferensbidrag (Refereegranskat)
    Abstract [en]

    During recent years, games with a purpose (GWAPs) have become increasingly popular for studying human behaviour [1–4]. However, no standardised method for web-based game experiments has been proposed so far. We present here our approach comprising an extended version of the CaSimIR social game framework [5] for data collection, mini-games for tempo and rhythm tapping, and an initial analysis of the data collected so far. The game presented here is part of the Spot The Odd Song Out game, which is freely available for use on Facebook and on the Web 1 .We present the GWAP method in some detail and a preliminary analysis of data collected. We relate the tapping data to perceptual ratings obtained in previous work. The results suggest that the tapped tempo data collected in a GWAP can be used to predict perceived speed. I toned down the above statement as I understand from the results section that our data are not as good as When averagingthe rhythmic performances of a group of 10 players in the second experiment, the tapping frequency shows a pattern that corresponds to the time signature of the music played. Our experience shows that more effort in design and during runtime is required than in a traditional experiment. Our experiment is still running and available on line.

  • 67. Bennett, Paul
    et al.
    Gabrilovich, Evgeniy
    Kamps, Jaap
    Karlgren, Jussi
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Report on the Sixth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR '13)2014Ingår i: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 48, nr 1, s. 13-20Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    There is an increasing amount of structure on the web as a result of modern web languages, user tagging and annotation, emerging robust NLP tools, and an ever growing volume of linked data. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by enhancing the depth of analysis of today's systems. Currently, we have only started exploring the possibilities and only begin to understand how these valuable semantic cues can be put to fruitful use.

    ESAIR'13 focuses on two of the most challenging aspects to address in the coming years. First, there is a need to include the currently emerging knowledge resources (such as DBpedia, Freebase) as underlying semantic model giving access to an unprecedented scope and detail of factual information. Second, there is a need to include annotations beyond the topical dimension (think of sentiment, reading level, prerequisite level, etc) that contain vital cues for matching the specific needs and profile of the searcher at hand.

    There was a strong feeling that we made substantial progress. Specifically, the discussion contributed to our understanding of the way forward. First, emerging large scale knowledge bases form a crucial component for semantic search, providing a unified framework with zillions of entities and relations. Second, in addition to low level factual annotation, non-topical annotation of larger chunks of text can provide powerful cues on the expertise of the search and (un)suitability of information. Third, novel user interfaces are key to unleash powerful structured querying enabled by semantic annotation|the potential of rich document annotations can only be realized if matched by more articulate queries exploiting these powerful retrieval cues|and a more dynamic approach is emerging by exploiting new forms of query autosuggest.

  • 68. Berzak, Yevgeni
    et al.
    Richter, Michal
    Ehrler, Carsten
    Shore, Todd
    Saarland University, Saarbrücken, Germany.
    Information Retrieval and Visualization for the Historical Domain2011Ingår i: Language Technology for Cultural Heritage: Selected Papers from the LaTeCH Workshop Series / [ed] Sporleder, Caroline; van den Bosch, Antal; Zervanou, Kalliopi, Berlin, Heidelberg: Springer Berlin/Heidelberg, 2011, s. 197-212Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    Working with large and unstructured collections of historical documents is a challenging task for historians. Despite the recent growth in the volume of digitized historical data, available collections are rarely accompanied by computational tools that significantly facilitate this task.We address this shortage by proposing a visualization method for document collections that focuses on graphical representation of similarities between documents. The strength of the similarities is measured according to the overlap of historically significant information such as named entities,or the overlap of general vocabulary. Similarity strengths are then encoded in the edges of a graph.The graph provides visual structure, revealing interpretable clusters and links between documents that are otherwise difficult to establish. We implement the idea of similarity graphs within an information retrieval system supported by an interactive graphical user interface. The system allows querying the database, visualizing the results and browsing the collection in an effective and intuitive way. Our approach can be easy adapted and extended to collections of documents in other domains.

  • 69.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Perception of Gaze Direction in 2D and 3D Facial Projections2010Ingår i: The ACM / SSPNET 2nd International Symposium on Facial Analysis and Animation, New York, USA: ACM Press, 2010, s. 24-24Konferensbidrag (Refereegranskat)
    Abstract [en]

    In human-human communication, eye gaze is a fundamental cue in e.g. turn-taking and interaction control [Kendon 1967]. Accurate control of gaze direction is therefore crucial in many applications of animated avatars striving to simulate human interactional behaviors. One inherent complication when conveying gaze direction through a 2D display, however, is what has been referred to as the Mona Lisa effect; if the avatar is gazing towards the camera, the eyes seem to "follow" the beholder whatever vantage point he or she may assume [Boyarskaya and Hecht 2010]. This becomes especially problematic in applications where multiple persons are interacting with the avatar, and the system needs to use gaze to address a specific person. Introducing 3D structure in the facial display, e.g. projecting the avatar face on a face mask, makes the percept of the avatar's gazechange with the viewing angle, as is indeed the case with real faces. To this end, [Delaunay et al. 2010] evaluated two back-projected displays - a spherical "dome" and a face shaped mask. However, there may be many factors influencing gaze directionpercieved from a 3D facial display, so an accurate calibration procedure for gaze directionis called for.

  • 70.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Alexanderson, Simon
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Kinetic Data for Large-Scale Analysis and Modeling of Face-to-Face Conversation2011Ingår i: Proceedings of International Conference on Audio-Visual Speech Processing 2011 / [ed] Salvi, G.; Beskow, J.; Engwall, O.; Al Moubayed, S., Stockholm: KTH Royal Institute of Technology, 2011, s. 103-106Konferensbidrag (Refereegranskat)
    Abstract [en]

    Spoken face to face interaction is a rich and complex form of communication that includes a wide array of phenomena thatare not fully explored or understood. While there has been extensive studies on many aspects in face-to-face interaction, these are traditionally of a qualitative nature, relying on hand annotated corpora, typically rather limited in extent, which is a natural consequence of the labour intensive task of multimodal data annotation. In this paper we present a corpus of 60 hours of unrestricted Swedish face-to-face conversations recorded with audio, video and optical motion capture, and we describe a new project setting out to exploit primarily the kinetic data in this corpus in order to gain quantitative knowledge on humanface-to-face interaction.

  • 71.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Bruce, Gösta
    Lund universitet.
    Enflo, Laura
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Schötz, Susanne
    Lund universitet.
    Human Recognition of Swedish Dialects2008Ingår i: Proceedings of Fonetik 2008: The XXIst Swedish Phonetics Conference / [ed] Anders Eriksson, Jonas Lindh, Göteborg: Göteborgs universitet , 2008, s. 61-64Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Our recent work within the research projectSIMULEKT (Simulating Intonational Varieties of Swedish) involves a pilot perceptiontest, used for detecting tendencies in humanclustering of Swedish dialects. 30 Swedishlisteners were asked to identify the geographical origin of 72 Swedish native speakers by clicking on a map of Sweden. Resultsindicate for example that listeners from thesouth of Sweden are generally better at recognizing some major Swedish dialects thanlisteners from the central part of Sweden.

  • 72.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Multimodal Interaction Control2009Ingår i: Computers in the Human Interaction Loop / [ed] Waibel, Alexander; Stiefelhagen, Rainer, Berlin/Heidelberg: Springer Berlin/Heidelberg, 2009, s. 143-158Kapitel i bok, del av antologi (Refereegranskat)
  • 73.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Cerrato, Loredana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Evaluation of the expressivity of a Swedish talking head in the context of human-machine interaction2008Ingår i: Comunicazione parlatae manifestazione delle emozioni: Atti del I Convegno GSCP, Padova 29 novembre - 1 dicembre 2004 / [ed] Emanuela Magno Caldognetto, Federica Cavicchio e Piero Cosi, 2008Konferensbidrag (Refereegranskat)
    Abstract [en]

    ABSTRACTThis paper describes a first attempt at synthesis and evaluation of expressive visualarticulation using an MPEG-4 based virtual talking head. The synthesis is data-driven,trained on a corpus of emotional speech recorded using optical motion capture. Eachemotion is modelled separately using principal component analysis and a parametriccoarticulation model.In order to evaluate the expressivity of the data driven synthesis two tests wereconducted. Our talking head was used in interactions with a human being in a givenrealistic usage context.The interactions were presented to external observers that were asked to judge theemotion of the talking head. The participants in the experiment could only hear the voice ofthe user, which was a pre-recorded female voice, and see and hear the talking head. Theresults of the evaluation, even if constrained by the results of the implementation, clearlyshow that the visual expression plays a relevant role in the recognition of emotions.

  • 74.
    Beskow, Jonas
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cerrato, Loredana
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cosi, P.
    Costantini, E.
    Nordstrand, Magnus
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Pianesi, F.
    Prete, M.
    Svanfeldt, Gunilla
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Preliminary cross-cultural evaluation of expressiveness in synthetic faces2004Ingår i: Affective Dialogue Systems, Proceedings / [ed] Andre E, Dybkjaer L, Minker W, Heisterkamp P, Berlin: SPRINGER-VERLAG , 2004, s. 301-304Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper reports the results of a preliminary cross-evaluation experiment run in the framework of the European research project PF-Star(1), with the double I aim of evaluating the possibility of exchanging FAP data between the involved sites and assessing the-adequacy of the emotional facial gestures performed by talking heads. The results provide initial insights in the way people belonging to various cultures-react to natural and synthetic facial expressions produced in different cultural settings, and in the potentials and limits of FAP data exchange.

  • 75.
    Beskow, Jonas
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cerrato, Loredana
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Nordstrand, Magnus
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Svanfeldt, Gunilla
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    The Swedish PFs-Star Multimodal Corpora2004Ingår i: Proceedings of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces, 2004, s. 34-37Konferensbidrag (Refereegranskat)
    Abstract [en]

    The aim of this paper is to present the multimodal speech corpora collected at KTH, in the framework of the European project PF-Star, and discuss some of the issues related to the analysis and implementation of human communicative and emotional visual correlates of speech in synthetic conversational agents. Two multimodal speech corpora have been collected by means of an opto-electronic system, which allows capturing the dynamics of emotional facial expressions with very high precision. The data has been evaluated through a classification test and the results show promising identification rates for the different acted emotions. These multimodal speech corpora will truly represent a valuable source to get more knowledge about how speech articulation and communicative gestures are affected by the expression of emotions.

  • 76.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hellmer, Kahl
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Project presentation: Spontal: multimodal database of spontaneous dialog2009Ingår i: Proceedings of Fonetik 2009: The XXIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 190-193Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We describe the ongoing Swedish speech database project Spontal: Multimodal database of spontaneous speech in dialog (VR 2006-7482). The project takes as its point of departure the fact that both vocal signals and gesture involving the face and body are important in every-day, face-to-face communicative interaction, and that there is a great need for data with which we more precisely measure these.

  • 77.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Jonsson, Oskar
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Speech technology in the European project MonAMI2008Ingår i: Proceedings of FONETIK 2008 / [ed] Anders Eriksson, Jonas Lindh, Gothenburg, Sweden: University of Gothenburg , 2008, s. 33-36Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper describes the role of speech and speech technology in the European project MonAMI, which aims at “mainstreaming ac-cessibility in consumer goods and services, us-ing advanced technologies to ensure equal ac-cess, independent living and participation for all”. It presents the Reminder, a prototype em-bodied conversational agent (ECA) which helps users to plan activities and to remember what to do. The prototype merges speech technology with other, existing technologies: Google Cal-endar and a digital pen and paper. The solution allows users to continue using a paper calendar in the manner they are used to, whilst the ECA provides notifications on what has been written in the calendar. Users may also ask questions such as “When was I supposed to meet Sara?” or “What’s on my schedule today?”

  • 78.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Tobiasson, Helena
    KTH, Skolan för datavetenskap och kommunikation (CSC), Människa-datorinteraktion, MDI (stängd 20111231).
    The MonAMI Reminder: a spoken dialogue system for face-to-face interaction2009Ingår i: Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, U.K, 2009, s. 300-303Konferensbidrag (Refereegranskat)
    Abstract [en]

    We describe the MonAMI Reminder, a multimodal spoken dialogue system which can assist elderly and disabled people in organising and initiating their daily activities. Based on deep interviews with potential users, we have designed a calendar and reminder application which uses an innovative mix of an embodied conversational agent, digital pen and paper, and the web to meet the needs of those users as well as the current constraints of speech technology. We also explore the use of head pose tracking for interaction and attention control in human-computer face-to-face interaction.

  • 79.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Modelling humanlike conversational behaviour2010Ingår i: SLTC 2010: The Third Swedish Language Technology Conference (SLTC 2010), Proceedings of the Conference, Linköping, Sweden, 2010, s. 9-10Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We have a visionar y goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is humanlike. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: modelling interactional aspects of spoken face-to-face communication.

  • 80.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Research focus: Interactional aspects of spoken face-to-face communication2010Ingår i: Proceedings from Fonetik, Lund, June 2-4, 2010: / [ed] Susanne Schötz, Gilbert Ambrazaitis, Lund, Sweden: Lund University , 2010, s. 7-10Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We have a visionary goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is human-like. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: interactional aspects of spoken face-to-face communication.

  • 81.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Nordqvist, Peter
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Visualization of speech and audio for hearing-impaired persons2008Ingår i: Technology and Disability, ISSN 1055-4181, Vol. 20, nr 2, s. 97-107Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Speech and sounds are important sources of information in our everyday lives for communication with our environment, be it interacting with fellow humans or directing our attention to technical devices with sound signals. For hearing impaired persons this acoustic information must be supplemented or even replaced by cues using other senses. We believe that the most natural modality to use is the visual, since speech is fundamentally audiovisual and these two modalities are complementary. We are hence exploring how different visualization methods for speech and audio signals may support hearing impaired persons. The goal in this line of research is to allow the growing number of hearing impaired persons, children as well as the middle-aged and elderly, equal participation in communication. A number of visualization techniques are proposed and exemplified with applications for hearing impaired persons.

  • 82.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Goda utsikter för teckenspråksteknologi2010Ingår i: Språkteknologi för ökad tillgänglighet: Rapport från ett nordiskt seminarium / [ed] Domeij, R.; Breivik, T.; Halskov, J.; Kirchmeier-Andersen, S.; Langgård, P.; Moshagen, S., Linköping: Linköping University Electronic Press, 2010, s. 77-86Konferensbidrag (Övrigt vetenskapligt)
    Abstract [sv]

    I dag finns stora brister i tillgängligheten i samhället vad gäller teckentolkning. Nya tekniska landvinningar inom dator- och animationsteknologi, och det senaste decenniets forskning kring syntetisk teckentolkning har lett till att det nu finns nya förutsättningar att hitta tekniska lösningar med potential att förbättra tillgängligheten avsevärt för teckenspråkiga, för vissa typer av tjänster eller situationer. I Sverige finns idag ca 30 000 teckenspråksanvändare. Kunskapsläget har utvecklats mycket under senare år, både vad gäller förståelse/beskrivning av teckenspråk och tekniska förutsättningar för att analysera, lagra och generera teckenspråk. I kapitlet beskriver vi de olika tekniker som krävs för att utveckla teckenspråkteknologi. Det senaste decenniet har forskningen kring teckenspråkteknogi tagit fart, och ett flertal internationella projekt har startat. Ännu har bara ett fåtal tillämpningarblivit allmänt tillgängliga. Vi ger exempel på både forskningsprojekt och tidiga tillämpningar, speciellt från Europa där utvecklingen varit mycket stark. Utsikterna att starta en svensk utveckling inom området får anses goda. De kunskapsmässiga förutsättningarna är utmärkta; teknikkunnande inom språkteknologi, multimodal registrering och animering bl.a. vid KTH i kombination med fackkunskaper inom svenskt teckenspråk och teckenspråksanvändning vid Stockholms Universitet.

  • 83.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Analysis and synthesis of multimodal verbal and non-verbal interaction for animated interface agents2007Ingår i: VERBAL AND NONVERBAL COMMUNICATION BEHAVIOURS / [ed] Esposito, A; FaundezZanuy, M; Keller, E; Marinaro, M, BERLIN: SPRINGER-VERLAG BERLIN , 2007, Vol. 4775, s. 250-263Konferensbidrag (Refereegranskat)
    Abstract [en]

    The use of animated talking agents is a novel feature of many multimodal spoken dialogue systems. The addition and integration of a virtual talking head has direct implications for the way in which users approach and interact with such systems. However, understanding the interactions between visual expressions, dialogue functions and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is closely related to the speech acoustics, while there are other articulatory movements affecting speech acoustics that are not visible on the outside of the face. Many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. This chapter looks into the communicative function of the animated talking agent, and its effect on intelligibility and the flow of the dialogue.

  • 84.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Focal accent and facial movements in expressive speech2006Ingår i: Proceedings from Fonetik 2006, Lund, June, 7-9, 2006 / [ed] Gilbert Ambrazaitis, Susanne Schötz, Lund: Lund University , 2006, s. 9-12Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systetnatically varied. The utterances were recorded in a variety of expressive modes including Certain, Confirming,Questioning, Uncertain, Happy, Angry and Neutral. Results showed that in all expressive modes, words with focal accent are accompanied by a greater variation of the facial parameters than are words in non-focal positions. Moreover, interesting differences between the expressions in terms of different parameters were found.

  • 85.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Experiments with Synthesis of Swedish Dialects2009Ingår i: Proceedings of Fonetik 2009 / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 28-29Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We describe ongoing work on synthesizing Swedish dialects with an HMM synthesizer. A prototype synthesizer has been trained on alarge database for standard Swedish read by a professional male voice talent. We have selected a few untrained speakers from each ofthe following dialectal region: Norrland, Dala,Göta, Gotland and South of Sweden. The planis to train a multi-dialect average voice, and then use 20-30 minutes of dialectal speech from one speaker to adapt either the standard Swedish voice or the average voice to the dialect of that speaker.

  • 86.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Nordenberg, Mikael
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Data-driven synthesis of expressive visual speech using an MPEG-4 talking head2005Ingår i: 9th European Conference on Speech Communication and Technology, Lisbon, 2005, s. 793-796Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes initial experiments with synthesis of visual speech articulation for different emotions, using a newly developed MPEG-4 compatible talking head. The basic problem with combining speech and emotion in a talking head is to handle the interaction between emotional expression and articulation in the orofacial region. Rather than trying to model speech and emotion as two separate properties, the strategy taken here is to incorporate emotional expression in the articulation from the beginning. We use a data-driven approach, training the system to recreate the expressive articulation produced by an actor while portraying different emotions. Each emotion is modelled separately using principal component analysis and a parametric coarticulation model. The results so far are encouraging but more work is needed to improve naturalness and accuracy of the synthesized speech.

  • 87.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Salvi, Giampiero
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    SynFace: Verbal and Non-verbal Face Animation from Audio2009Ingår i: Proceedings of The International Conference on Auditory-Visual Speech Processing AVSP'09 / [ed] Barry-John Theobald, Richard Harvey, Norwich, England, 2009Konferensbidrag (Refereegranskat)
    Abstract [en]

    We give an overview of SynFace, a speech-drivenface animation system originally developed for theneeds of hard-of-hearing users of the telephone. Forthe 2009 LIPS challenge, SynFace includes not onlyarticulatory motion but also non-verbal motion ofgaze, eyebrows and head, triggered by detection ofacoustic correlates of prominence and cues for interactioncontrol. In perceptual evaluations, both verbaland non-verbal movmements have been found to havepositive impact on word recognition scores.

  • 88. Biadsy, F.
    et al.
    Rosenberg, A.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hirschberg, J.
    Strangert, E.
    A Cross-Cultural Comparison of American, Palestinian, and Swedish2008Ingår i: Speech Prosody 2008, Campinas, Brazil, 2008Konferensbidrag (Refereegranskat)
    Abstract [en]

    Perception of charisma, the ability to influence others by virtueof one’s personal qualities, appears to be influenced to someextent by cultural factors. We compare results of five studies of charisma speech in which American, Palestinian, andSwedish subjects rated Standard American English politicalspeech and Americans and Palestinians rated Palestinian Arabic speech. We identify acoustic-prosodic and lexical featurescorrelated with charisma ratings of both languages for nativeand non-native speakers and find that 1) some acoustic-prosodicfeatures correlated with charisma ratings appear similar acrossall five experiments; 2) other acoustic-prosodic and lexical features correlated with charisma appear specific to the languagerated, whatever the native language of the rater; and 3) stillother acoustic-prosodic cues appear specific to both rater nativelanguage and to language rated. We also find that, while theabsolute ratings non-native raters assign tend to be lower thanthose of native speakers, the ratings themselves are strongly correlated.

  • 89. Bisitz, T.
    et al.
    Herzke, T.
    Zokoll, M.
    Öster, Anne-Marie
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Ormel, E.
    Van Son, N.
    Tanke, R.
    Noise Reduction for Media Streams2009Ingår i: NAG/DAGA'09 International Conference on Acoustics: including the 35th German Annual Conference on Acoustics (DAGA) / [ed] Marinus M. Boone, Red Hook, USA: Curran Associates, Inc., 2009Konferensbidrag (Refereegranskat)
  • 90. Bissiri, M.P.
    et al.
    Zellers, Margaret
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Perception of glottalization in varying pitch contexts across languages2013Ingår i: INTERSPEECH-2013, 2013, s. 253-257Konferensbidrag (Refereegranskat)
    Abstract [en]

    Glottalization is often associated with low pitch in intonation languages, but evidence from many languages indicates that this is not an obligatory association. We asked speakers of German, English and Swedish to compare glottalized stimuli with several pitch contour alternatives in an AXB listening test. Although the low F0 in the glottalized stimuli tended to be perceived as most similar to falling pitch contours, this was not always the case, indicating that pitch perception in glottalization cannot be predicted by F0 alone. We also found evidence for cross-linguistic differences in the degree of flexibility of pitch judgments in glottalized stretches of speech.

  • 91. Bissiri, M.P.
    et al.
    Zellers, Margaret
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Ding, H.
    Perception of glottalization in varying pitch contexts in Mandarin Chinese2014Ingår i: Proceedings of Speech Prosody 7 / [ed] Campbell, N.; Gibbon, D.; Hirst, D., 2014, s. 633-637Konferensbidrag (Refereegranskat)
    Abstract [en]

    Although glottalization has often been associated with lowpitch, evidence from a number of sources supports theassertion that this association is not obligatory, and is likely tobe language-specific. Following a previous study testingperception of glottalization by German, English, and Swedishlisteners, the current research investigates the influence ofpitch context on the perception of glottalization by nativespeakers of a tone language, Mandarin Chinese. Listenersheard AXB sets in which they were asked to match glottalizedstimuli with pitch contours. We find that Mandarin listenerstend not to be influenced by the pitch context when judgingthe pitch of glottalized stretches of speech. These data lendsupport to the idea that the perception of glottalization variesin relation to language-specific prosodic structure.

  • 92. Björklund, Staffan
    et al.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH. University College of Music Education, Sweden.
    Relationship Between Subglottal Pressure and Sound Pressure Level in Untrained Voices2016Ingår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 30, nr 1, s. 15-20Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Objectives Subglottal pressure (Ps) is strongly correlated with sound pressure level (SPL) and is easy to measure by means of commonly available equipment. The SPL/Ps ratio is strongly dependent on the efficiency of the phonatory apparatus and should be of great relevance to clinical practice. However, published normative data are still missing. Method The subjects produced sequences of the syllable [pæ], and Ps was measured as the oral pressure during the [p] occlusion. The Ps to SPL relationship was determined at four pitches produced by 16 female and 15 male healthy voices and analyzed by means of regression analysis. Average correlation between Ps and SPL, average SPL produced with a Ps of 10 cm H2O, and average SPL increase produced by a doubling of Ps were calculated for the female and for the male subjects. The significance of sex and pitch conditions was analyzed by means of analysis of variance (ANOVA). Results Pitch was found to be an insignificant condition. The average correlation between Ps and SPL was 0.83 and did not differ significantly between the female and male subjects. In female and male subjects, Ps = 10 cm H2O produced 78.1 dB and 80.0 dB SPL at 0.3 m, and a doubling of Ps generated 11.1 dB and 9.3 dB increase of SPL. Both these gender differences were statistically significant. Conclusions The relationship between Ps and SPL can be reliably established from series of repetitions of the syllable [pæ] produced with a continuously changing degree of vocal loudness. Male subjects produce slightly higher SPL for a given pressure than female subjects but gain less for a doubling of Ps. As these relationships appear to be affected by phonation type, it seems possible that in the future, the method can be used for documenting degree of phonatory hypofunction and hyperfunction.

  • 93.
    Björkman, Beyza
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    English as the lingua franca of engineering: the morphosyntax of academic speech events2008Ingår i: Nordic Journal of English Studies, ISSN 1654-6970, E-ISSN 1654-6970, Vol. 7, nr 3, s. 103-122Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    English today is frequently used as an international means of communication among its non-native speakers from different L1 backgrounds. Research on English as a lingua franca (ELF) has already revealed commonalities and common processes from a variety of settings. It is important that research continues and that lingua franca usage in different environments is described to find ways to optimize communication. This paper will focus on the morphosyntax of spoken ELF, reporting the results of a study that investigates spoken lingua franca English in tertiary education (engineering) in Sweden, where English is increasingly becoming the language of instruction. The morphosyntax of non-native-like usage is investigated in dialogic and monologic speech events. Cases of non-native-like usage are grouped as ‘disturbing’, i.e. causing comprehension problems and ‘non-disturbing’, i.e. causing no comprehension problems. Findings from this corpus-based study show that the most consistent idiosyncrasies in lingua franca usage in this setting are observed in redundant features of the language and that there is very little disturbance, i.e. breakdown in communication. Engineers seem to opt for function and reciprocal intelligibility over redundant features of the language and accuracy when they speak English in academic contexts.

  • 94. Björkman, Beyza
    From code to discourse in spoken ELF2009Ingår i: English as a lingua franca: Studies and findings / [ed] Mauranen, A.; Ranta, E., Newcastle upon Tyne: Cambridge Scholars Publishing, 2009, 1, s. 225-252Kapitel i bok, del av antologi (Refereegranskat)
  • 95. Björkman, Beyza
    'So where we are' Spoken lingua franca English at a Swedish technical university2008Ingår i: English Today, ISSN 0266-0784, E-ISSN 1474-0567, Vol. 24, nr 2, s. 35-41Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This article discusses the use of English as a lingua franca (ELF) by engineering students and its effectiveness in content courses at a technical university, reporting the preliminary results of part of a study that investigates authentic and high-stakes speech events at a Swedish technical university. The main aim of my research is to find out what kind of divergence from standard morphosyntactic forms of English if any leads to disturbance, i.e. breakdown, in ELF speech.

  • 96.
    Björkman, Beyza
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    "So you think you can ELF?": English as a lingua franca as the medium of instruction2010Ingår i: Hermes - Journal of Language and Communication Studies, ISSN 0904-1699, E-ISSN 1903-1785, Vol. 45, s. 77-99Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper reports the findings of a study on spoken English as a lingua franca (ELF) in Swedish higher education. The aim has been to investigate the role pragmatic strategies play in content lectures where English is a lingua franca, i.e. a vehicular language. The findings show that lecturers in ELF settings make less frequent use of pragmatic strategies than students, who deploy these strategies frequently in group-work projects. Earlier stages of the present study showed that despite frequent non-standardness at the morphosyntax level, there is very little overt disturbance in student group-work (Björkman 2008 a and b/2009b), most likely owing to a variety of communicative strategies used during interaction and the questions raised (Björkman, 2009a). It seems reasonable to assume that, in the absence of appropriate strategies and questions that serve as real-time signals of disturbance, there is an increased risk for covert disturbance in lectures. This view complies with the findings of earlier studies on the importance of such strategies (Mauranen 2006, Airey 2009:79, Hellekjær 2010). The findings imply that the effectiveness of a speaker of English in academic ELF settings is determined primarily by the speaker’s pragmatic ability and less by his/her proficiency. There are important implications of these findings for lecturers who need to operate in ELF settings. First, increasing interactivity by using pragmatic strategies sufficiently frequently appears critical for those involved in English-medium education. It is also important that awareness is raised on target language usage in lecturing in English. Such awareness-raising can be achieved at the macro level by clearly-written language policies that include training for teachers and students who both need to be equipped with the skills needed to cope with the complexities of such settings, and at the micro level, by in-house training and courses that could be administered to both teachers and students.

  • 97. Björkman, Beyza
    'We' and 'you': pronouns and genre competence in oral technical descriptions2007Ingår i: Linguistic Diversity and Sustainable Development: Rapport från ASLA:s höstsymposium, Eskilstuna, 9-10 november 2006 / [ed] Lainio, J.; Leppänen, A., Uppsala: Swedish Science Press, 2007, s. 89-109Konferensbidrag (Refereegranskat)
  • 98.
    Björkner, Eva
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Alku, P.
    Subglottal pressure and NAQ variation in voice production of classically trained baritone singers2005Ingår i: 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 2005, s. 1057-1060Konferensbidrag (Refereegranskat)
    Abstract [en]

    The subglottal pressure (Ps) and voice source characteristics of five professional baritone singers were analyzed. Glottal adduction was estimated with amplitude quotient (AQ), defined as the ratio between peak-to-peak pulse amplitude and the negative peak of the differentiated flow glottogram, and with normalized amplitude quotient (NAQ), defined as AQ divided by fundamental period length. Previous studies show that NAQ and its variation with Ps represent an effective parameter in the analysis of voice source characteristics. Therefore, the present study aims at increasing our knowledge of these two parameters further by finding out how they vary with pitch and Ps in operatic baritone singers, singing at high and low pitch. Ten equally spaced Ps values were selected from three takes of the syllable [pae], repeated with a continuously decreasing vocal loudness and initiated at maximum vocal loudness. The vowel sounds following the selected Ps peaks were inverse filtered. Data on peak-to-peak pulse amplitude, maximum flow declination rate, AQ and NAQ will be presented.

  • 99.
    Björkner, Eva
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Sundberg, Johan
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cleveland, T.
    Stone, R. E.
    Voice source register differences in female musical theatre singers2004Ingår i: Proc Baltic-Nordic Acoustics Meeting 2004, BNAM04, Mariehamn, 2004Konferensbidrag (Refereegranskat)
  • 100.
    Björkner, Eva
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Cleveland, Tom
    Vanderbilt Voice Center, Dept. of Otolaryngology, Vanderbilt University Medical Center, Nashville.
    Stone, R E
    Vanderbilt Voice Center, Dept. of Otolaryngology, Vanderbilt University Medical Center, Nashville.
    Voice source characteristics in different registers in classically trained female musical theatre singers2004Ingår i: Proceedings of ICA 2004 : the 18th International Congress on Acoustics, Kyoto International Conference Hall, 4-9 April, Kyoto, Japan: acoustical science and technology for quality of life, Kyoto, Japan, 2004, s. 297-300Konferensbidrag (Refereegranskat)
    Abstract [en]

    Musical theatre singing requires the use of twovocal registers in the female voice. The voice source and subglottal pressure Pscharacteristics of these registers are analysed by inverse filtering. The relationship between Psand closed quotient Qclosed, peak-to-peak pulse amplitude Up-t-p, maximum flow declination rate MFDR and the normalised amplitude quotient NAQ were examined. Pswastypically slightly higher in chest than in head register . For typical tokens MFDR and Qclosed were significantly greater while NAQ and Up-t-p were significantly lower in chest than in head.

1234567 51 - 100 av 692
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf