Ändra sökning
Avgränsa sökresultatet
12345 1 - 50 av 201
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Agelfors, Eva
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Beskow, Jonas
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Dahlquist, Martin
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Lundeberg, Magnus
    Salvi, Giampiero
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Spens, Karl-Erik
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Öhman, Tobias
    A synthetic face as a lip-reading support for hearing impaired telephone users - problems and positive results1999Ingår i: European audiology in 1999: proceeding of the 4th European Conference in Audiology, Oulu, Finland, June 6-10, 1999, 1999Konferensbidrag (Refereegranskat)
  • 2.
    Agelfors, Eva
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Beskow, Jonas
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Lundeberg, Magnus
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Salvi, Giampiero
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Spens, Karl-Erik
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Öhman, Tobias
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Synthetic visual speech driven from auditory speech1999Ingår i: Proceedings of Audio-Visual Speech Processing (AVSP'99)), 1999Konferensbidrag (Refereegranskat)
    Abstract [en]

    We have developed two different methods for using auditory, telephone speech to drive the movements of a synthetic face. In the first method, Hidden Markov Models (HMMs) were trained on a phonetically transcribed telephone speech database. The output of the HMMs was then fed into a rulebased visual speech synthesizer as a string of phonemes together with time labels. In the second method, Artificial Neural Networks (ANNs) were trained on the same database to map acoustic parameters directly to facial control parameters. These target parameter trajectories were generated by using phoneme strings from a database as input to the visual speech synthesis The two methods were evaluated through audiovisual intelligibility tests with ten hearing impaired persons, and compared to “ideal” articulations (where no recognition was involved), a natural face, and to the intelligibility of the audio alone. It was found that the HMM method performs considerably better than the audio alone condition (54% and 34% keywords correct respectively), but not as well as the “ideal” articulating artificial face (64%). The intelligibility for the ANN method was 34% keywords correct.

  • 3.
    Askenfelt, Anders
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Special issue: Selected papers from the Stockholm Music Acoustics Conference - Introduction2004Ingår i: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, Vol. 33, nr 3, s. 185-187Artikel i tidskrift (Övrigt vetenskapligt)
  • 4.
    Bell, Linda
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel. Telia Research, Sweden.
    Boye, Johan
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Gustafson, Joakim
    KTH, Tidigare Institutioner, Tal, musik och hörsel. Telia Research, Sweden.
    Real-time Handling of Fragmented Utterances2001Ingår i: Proceedings of the NAACL Workshop on Adaption in Dialogue Systems, 2001Konferensbidrag (Refereegranskat)
    Abstract [en]

    this paper, we discuss an adaptive method of handling fragmented user utterances to a speech-based multimodal dialogue system. Inserted silent pauses between fragments present the following problem: Does the current silence indicate that the user has completed her utterance, or is the silence just a pause between two fragments, so that the system should wait for more input? Our system incrementally classifies user utterances as either closing (more input is unlikely to come) or non-closing (more input is likely to come), partly depending on the current dialogue state. Utterances that are categorized as non-closing allow the dialogue system to await additional spoken or graphical input before responding

  • 5.
    Bell, Linda
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Boye, Johan
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Gustafson, Joakim
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Wirén, Mats
    Modality Convergence in a Multimodal Dialogue System2000Ingår i: Proceedings of Götalog, 2000, s. 29-34Konferensbidrag (Refereegranskat)
    Abstract [en]

    When designing multimodal dialogue systems allowing speech as well as graphical operations, it is important to understand not only how people make use of the different modalities in their utterances, but also how the system might influence a user’s choice of modality by its own behavior. This paper describes an experiment in which subjects interacted with two versions of a simulated multimodal dialogue system. One version used predominantly graphical means when referring to specific objects; the other used predominantly verbal referential expressions. The purpose of the study was to find out what effect, if any, the system’s referential strategy had on the user’s behavior. The results provided limited support for the hypothesis that the system can influence users to adopt another modality for the purpose of referring

  • 6.
    Bell, Linda
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Eklund, Robert
    Gustafson, Joakim
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    A Comparison of Disfluency Distribution in a Unimodal and a Multimodal Speech Interface2000Ingår i: Proceedings of ICSLP 00, 2000Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we compare the distribution of disfluencies in two human--computer dialogue corpora. One corpus consists of unimodal travel booking dialogues, which were recorded over the telephone. In this unimodal system, all components except the speech recognition were authentic. The other corpus was collected using a semi-simulated multi-modal dialogue system with an animated talking agent and a clickable map. The aim of this paper is to analyze and discuss the effects of modality, task and interface design on the distribution and frequency of disfluencies in these two corpora.

  • 7.
    Bell, Linda
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Gustafson, Joakim
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Positive and Negative User Feedback in a Spoken Dialogue Corpus2000Ingår i: Proceedings of ICSLP 00, 2000Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper examines feedback strategies in a Swedish corpus of multimodal human--computer interaction. The aim of the study is to investigate how users provide positive and negative feedback to a dialogue system and to discuss the function of these utterances in the dialogues. User feedback in the AdApt corpus was labeled and analyzed, and its distribution in the dialogues is discussed. The question of whether it is possible to utilize user feedback in future systems is considered. More specifically, we discuss how error handling in human--computer dialogue might be improved through greater knowledge of user feedback strategies. In the present corpus, almost all subjects used positive or negative feedback at least once during their interaction with the system. Our results indicate that some types of feedback more often occur in certain positions in the dialogue. Another observation is that there appear to be great individual variations in feedback strategies, so that certain subjects give feedback at almost every turn while others rarely or never respond to a spoken dialogue system in this manner. Finally, we discuss how feedback could be used to prevent problems in human--computer dialogue.

  • 8.
    Bell, Linda
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Gustafson, Joakim
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Repetition and its phonetic realizations: investigating a Swedish database of spontaneous computer directed speech1999Ingår i: Proceedings of the XIVth International Congress of Phonetic Sciences / [ed] Ohala, John, 1999, s. 1221-Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper is an investigation of repetitive utterances in a Swedish database of spontaneous computer-directed speech. A spoken dialogue system was installed in a public location in downtown Stockholm and spontaneous human-computerinteractions with adults and children were recorded [1]. Several acoustic and prosodic features such as duration, shifting of focusand hyperarticulation were examined to see whether repetitions could be distinguished from what the users first said to the system. The present study indicates that adults and children use partly different strategies as they attempt to resolve errors by means of repetition. As repetition occurs, duration is increased and words are often hyperarticulated or contrastively focused. These results could have implications for the development of future spoken dialogue systems with robust error handling.

  • 9.
    Bertenstam, Johan
    et al.
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Mats, Blomberg
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Carlson, Rolf
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Elenius, Kjell
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Gustafson, Joakim
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Hunnicutt, Sheri
    Högberg, Jesper
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Lindell, Roger
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Neovius, Lennart
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Nord, Lennart
    de Serpa-Leitao, Antonio
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Ström, Nikko
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Spoken dialogue data collected in the Waxholm project1995Ingår i: Quarterly progress and status report: April 15, 1995 /Speech Transmission Laboratory, Stockholm: KTH , 1995, 1, s. 50-73Kapitel i bok, del av antologi (Övrigt vetenskapligt)
  • 10.
    Beskow, Jonas
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    ANIMATION OF TALKING AGENTS1997Ingår i: Proceedings of International Conference on Auditory-Visual Speech Processing / [ed] Benoït, C & Campbell, R, Rhodos, Greece, 1997, s. 149-152Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    It is envisioned that autonomous software agents that cancommunicate using speech and gesture will soon be oneverybody’s computer screen. This paper describes anarchitecture that can be used to design and animate characterscapable of lip-synchronised synthetic speech as well as bodygestures, for use in for example spoken dialogue systems. Ageneral scheme for computationally efficient parametricdeformation of facial surfaces is presented, as well as techniques for generation of bimodal speech, facial expressionsand body gestures in a spoken dialogue system. Resultsindicating that an animated cartoon-like character can be asignificant contribution to speech intelligibility, are also reported.

  • 11.
    Beskow, Jonas
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    RULE-BASED VISUAL SPEECH SYNTHESIS1995Ingår i: Proceedings of the 4th European Conference on Speech Communication and Technology, Madris, Spain, 1995, s. 299-302Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    A system for rule based audiovisual text-to-speech synthesishas been created. The system is based on the KTHtext-to-speech system which has been complementedwith a three-dimensional parameterized model of a humanface. The face can be animated in real time, synchronizedwith the auditory speech. The facial model iscontrolled by the same synthesis software as the auditoryspeech synthesizer. A set of rules that takes coarticulationinto account has been developed. The audiovisualtext-to-speech system has also been incorporated into aspoken man-machine dialogue system that is being developedat the department.

  • 12.
    Beskow, Jonas
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Talking Heads - Models and Applications for Multimodal Speech Synthesis2003Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    This thesis presents work in the area of computer-animatedtalking heads. A system for multimodal speech synthesis hasbeen developed, capable of generating audiovisual speechanimations from arbitrary text, using parametrically controlled3D models of the face and head. A speech-specific directparameterisation of the movement of the visible articulators(lips, tongue and jaw) is suggested, along with a flexiblescheme for parameterising facial surface deformations based onwell-defined articulatory targets.

    To improve the realism and validity of facial and intra-oralspeech movements, measurements from real speakers have beenincorporated from several types of static and dynamic datasources. These include ultrasound measurements of tonguesurface shape, dynamic optical motion tracking of face pointsin 3D, as well as electromagnetic articulography (EMA)providing dynamic tongue movement data in 2D. Ultrasound dataare used to estimate target configurations for a complex tonguemodel for a number of sustained articulations. Simultaneousoptical and electromagnetic measurements are performed and thedata are used to resynthesise facial and intra-oralarticulation in the model. A robust resynthesis procedure,capable of animating facial geometries that differ in shapefrom the measured subject, is described.

    To drive articulation from symbolic (phonetic) input, forexample in the context of a text-to-speech system, bothrule-based and data-driven articulatory control models havebeen developed. The rule-based model effectively handlesforward and backward coarticulation by targetunder-specification, while the data-driven model uses ANNs toestimate articulatory parameter trajectories, trained ontrajectories resynthesised from optical measurements. Thearticulatory control models are evaluated and compared againstother data-driven models trained on the same data. Experimentswith ANNs for driving the articulation of a talking headdirectly from acoustic speech input are also reported.

    A flexible strategy for generation of non-verbal facialgestures is presented. It is based on a gesture libraryorganised by communicative function, where each function hasmultiple alternative realisations. The gestures can be used tosignal e.g. turn-taking, back-channelling and prominence whenthe talking head is employed as output channel in a spokendialogue system. A device independent XML-based formalism fornon-verbal and verbal output in multimodal dialogue systems isproposed, and it is described how the output specification isinterpreted in the context of a talking head and converted intofacial animation using the gesture library.

    Through a series of audiovisual perceptual experiments withnoise-degraded audio, it is demonstrated that the animatedtalking head provides significantly increased intelligibilityover the audio-only case, in some cases not significantly belowthat provided by a natural face.

    Finally, several projects and applications are presented,where the described talking head technology has beensuccessfully employed. Four different multimodal spokendialogue systems are outlined, and the role of the talkingheads in each of the systems is discussed. A telecommunicationapplication where the talking head functions as an aid forhearing-impaired users is also described, as well as a speechtraining application where talking heads and languagetechnology are used with the purpose of improving speechproduction in profoundly deaf children.

  • 13.
    Beskow, Jonas
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Trainable articulatory control models for visual speech synthesis2004Ingår i: International Journal of Speech Technology, ISSN 1381-2416, E-ISSN 1572-8110, Vol. 7, nr 4, s. 335-349Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper deals with the problem of modelling the dynamics of articulation for a parameterised talkinghead based on phonetic input. Four different models are implemented and trained to reproduce the articulatorypatterns of a real speaker, based on a corpus of optical measurements. Two of the models, (“Cohen-Massaro”and “O¨ hman”) are based on coarticulation models from speech production theory and two are based on artificialneural networks, one of which is specially intended for streaming real-time applications. The different models areevaluated through comparison between predicted and measured trajectories, which shows that the Cohen-Massaromodel produces trajectories that best matches the measurements. A perceptual intelligibility experiment is alsocarried out, where the four data-driven models are compared against a rule-based model as well as an audio-alonecondition. Results show that all models give significantly increased speech intelligibility over the audio-alone case,with the rule-based model yielding highest intelligibility score.

  • 14.
    Beskow, Jonas
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cerrato, Loredana
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cosi, P.
    Costantini, E.
    Nordstrand, Magnus
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Pianesi, F.
    Prete, M.
    Svanfeldt, Gunilla
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Preliminary cross-cultural evaluation of expressiveness in synthetic faces2004Ingår i: Affective Dialogue Systems, Proceedings / [ed] Andre E, Dybkjaer L, Minker W, Heisterkamp P, Berlin: SPRINGER-VERLAG , 2004, s. 301-304Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper reports the results of a preliminary cross-evaluation experiment run in the framework of the European research project PF-Star(1), with the double I aim of evaluating the possibility of exchanging FAP data between the involved sites and assessing the-adequacy of the emotional facial gestures performed by talking heads. The results provide initial insights in the way people belonging to various cultures-react to natural and synthetic facial expressions produced in different cultural settings, and in the potentials and limits of FAP data exchange.

  • 15.
    Beskow, Jonas
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cerrato, Loredana
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Nordenberg, Mikael
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Nordstrand, Magnus
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Svanfeldt, Gunilla
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Expressive animated agents for affective dialogue systems2004Ingår i: AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS / [ed] Andre, E; Dybkjaer, L; Minker, W; Heisterkamp, P, BERLIN: SPRINGER , 2004, Vol. 3068, s. 240-243Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present our current state of development regarding animated agents applicable to affective dialogue systems. A new set of tools are under development to support the creation of animated characters compatible with the MPEG-4 facial animation standard. Furthermore, we have collected a multimodal expressive speech database including video, audio and 3D point motion registration. One of the objectives of collecting the database is to examine how emotional expression influences articulatory patterns, to be able to model this in our agents. Analysis of the 3D data shows for example that variation in mouth width due to expression greatly exceeds that due to vowel quality.

  • 16.
    Beskow, Jonas
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cerrato, Loredana
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Nordstrand, Magnus
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Svanfeldt, Gunilla
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    The Swedish PFs-Star Multimodal Corpora2004Ingår i: Proceedings of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces, 2004, s. 34-37Konferensbidrag (Refereegranskat)
    Abstract [en]

    The aim of this paper is to present the multimodal speech corpora collected at KTH, in the framework of the European project PF-Star, and discuss some of the issues related to the analysis and implementation of human communicative and emotional visual correlates of speech in synthetic conversational agents. Two multimodal speech corpora have been collected by means of an opto-electronic system, which allows capturing the dynamics of emotional facial expressions with very high precision. The data has been evaluated through a classification test and the results show promising identification rates for the different acted emotions. These multimodal speech corpora will truly represent a valuable source to get more knowledge about how speech articulation and communicative gestures are affected by the expression of emotions.

  • 17.
    Beskow, Jonas
    et al.
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Engwall, Olov
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Resynthesis of Facial and Intraoral Articulation fromSimultaneous Measurements2003Ingår i: Proceedings of the 15th International Congress of phonetic Sciences (ICPhS'03), Adelaide: Casual Productions , 2003Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Simultaneous measurements of tongue and facial motion,using a combination of electromagnetic articulography(EMA) and optical motion tracking, are analysed to improvethe articulation of an animated talking head and toinvestigate the correlation between facial and vocal tractmovement. The recorded material consists of VCV andCVC words and 270 short everyday sentences spoken byone Swedish subject. The recorded articulatory movementsare re-synthesised by a parametrically controlled 3D modelof the face and tongue, using a procedure involvingminimisation of the error between measurement and model.Using linear estimators, tongue data is predicted from theface and vice versa, and the correlation betweenmeasurement and prediction is computed.

  • 18.
    Beskow, Jonas
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Karlsson, Inger
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Kewley, J
    Salvi, Giampiero
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    SYNFACE - A talking head telephone for the hearing-impaired2004Ingår i: COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS: PROCEEDINGS / [ed] Miesenberger, K; Klaus, J; Zagler, W; Burger, D, BERLIN: SPRINGER , 2004, Vol. 3118, s. 1178-1185Konferensbidrag (Refereegranskat)
    Abstract [en]

    SYNFACE is a telephone aid for hearing-impaired people that shows the lip movements of the speaker at the other telephone synchronised with the speech. The SYNFACE system consists of a speech recogniser that recognises the incoming speech and a synthetic talking head. The output from the recogniser is used to control the articulatory movements of the synthetic head. SYNFACE prototype systems exist for three languages: Dutch, English and Swedish and the first user trials have just started.

  • 19. Bimbot, F.
    et al.
    Blomberg, Mats
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Boves, L.
    Genoud, D.
    Hutter, H. P.
    Jaboulet, C.
    Koolwaaij, J.
    Lindberg, J.
    Pierrot, J. B.
    An overview of the CAVE project research activities in speaker verification2000Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 31, nr 03-feb, s. 155-180Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This article presents an overview of the research activities carried out in the European CAVE project, which focused on text-dependent speaker verification on the telephone network using whole word Hidden Markov Models. It documents in detail various aspects of the technology and the methodology used within the project. In particular, it addresses the issue of model estimation in the context of limited enrollment data and the problem of a posteriori decision threshold setting. Experiments are carried out on the realistic telephone speech database SESP. State-of-the-art performance levers are obtained, which validates the technical approaches developed and assessed during the project as well as the working infrastructure which facilitated cooperation between the partners.

  • 20. Birch, Peer
    et al.
    Gümoes, Bodil
    Stavad, Hanne
    Prytz, Svend
    Björkner, Eva
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Sundberg, Johan
    Velum Behavior in Professional Classic Operatic Singing2002Ingår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 16, s. 61-71Artikel i tidskrift (Refereegranskat)
  • 21.
    Björkner, Eva
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Sundberg, Johan
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cleveland, T.
    Stone, R. E.
    Voice source register differences in female musical theatre singers2004Ingår i: Proc Baltic-Nordic Acoustics Meeting 2004, BNAM04, Mariehamn, 2004Konferensbidrag (Refereegranskat)
  • 22.
    Blomberg, Mats
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Elenius, Daniel
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Zetterholm, Elisabeth
    Department of Philosophy & Linguistics, Umeå University.
    Speaker verification scores and acoustic analysis of a professional impersonator2004Ingår i: Proceedings of Fonetik 2004: The XVIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University , 2004, s. 84-87Konferensbidrag (Övrigt vetenskapligt)
  • 23.
    Bresin, R.
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Emotional expression in music performance: synthesis and decoding1998Ingår i: TMH-QPSR, Vol. 39, nr 4, s. 085-094Artikel i tidskrift (Övrigt vetenskapligt)
  • 24.
    Bresin, R.
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Synthesis and decoding of emotionally expressive music performance1999Ingår i: Proceedings of the IEEE 1999 Systems, Man and Cybernetics Conference - SMC’99, 1999, Vol. 4, s. 317-322Konferensbidrag (Refereegranskat)
    Abstract [en]

    A recently developed application of Director Musices (DM) is presented. The DM is a rule-based software tool for automatic music performance developed at the Speech Music and Hearing Dept. at the Royal Institute of Technology, Stockholm. It is written in Common Lisp and is available both for Windows and Macintosh. It is demonstrated that particular combinations of rules defined in the DM can be used for synthesizing performances that differ in emotional quality. Different performances of two pieces of music were synthesized so as to elicit listeners’ associations to six different emotions (fear, anger, happiness, sadness, tenderness, and solemnity). Performance rules and their parameters were selected so as to match previous findings about emotional aspects of music performance. Variations of the performance variables IOI (Inter-Onset Interval), OOI (Offset-Onset Interval) and L (Sound Level) are presented for each rule-setup. In a forced-choice listening test 20 listeners were asked to classify the performances with respect to emotions. The results showed that the listeners, with very few exceptions, recognized the intended emotions correctly. This shows that a proper selection of rules and rule parameters in DM can indeed produce a wide variety of meaningful, emotional performances, even extending the scope of the original rule definition

  • 25.
    Bresin, Roberto
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Artificial neural networks based models for automatic performance of musical scores1998Ingår i: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, Vol. 27, nr 3, s. 239-270Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This article briefly summarises the author's research on automatic performance, started at CSC (Centro di Sonologia Computazionale, University of Padua) and continued at TMH-KTH (Speech, Music Hearing Department at the Royal Institute of Technology, Stockholm). The focus is on the evolution of the architecture of an artificial neural networks (ANNs) framework, from the first simple model, able to learn the KTH performance rules, to the final one, that accurately simulates the style of a real pianist performer, including time and loudness deviations. The task was to analyse and synthesise the performance process of a professional pianist, playing on a Disklavier. An automatic analysis extracts all performance parameters of the pianist, starting from the KTH rule system. The system possesses good generalisation properties: applying the same ANN, it is possible to perform different scores in the performing style used for the training of the networks. Brief descriptions of the program Melodia and of the two Java applets Japer and Jalisper are given in the Appendix. In Melodia, developed at the CSC, the user can run either rules or ANNs, and study their different effects. Japer and Jalisper, developed at TMH, implement in real time on the web the performance rules developed at TMH plus new features achieved by using ANNs.

  • 26.
    Bresin, Roberto
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Virtual virtuosity2000Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    This dissertation presents research in the field ofautomatic music performance with a special focus on piano.

    A system is proposed for automatic music performance, basedon artificial neural networks (ANNs). A complex,ecological-predictive ANN was designed thatlistensto the last played note,predictsthe performance of the next note,looksthree notes ahead in the score, and plays thecurrent tone. This system was able to learn a professionalpianist's performance style at the structural micro-level. In alistening test, performances by the ANN were judged clearlybetter than deadpan performances and slightly better thanperformances obtained with generative rules.

    The behavior of an ANN was compared with that of a symbolicrule system with respect to musical punctuation at themicro-level. The rule system mostly gave better results, butsome segmentation principles of an expert musician were onlygeneralized by the ANN.

    Measurements of professional pianists' performances revealedinteresting properties in the articulation of notes markedstaccatoandlegatoin the score. Performances were recorded on agrand piano connected to a computer.Staccatowas realized by a micropause of about 60% ofthe inter-onset-interval (IOI) whilelegatowas realized by keeping two keys depressedsimultaneously; the relative key overlap time was dependent ofIOI: the larger the IOI, the shorter the relative overlap. Themagnitudes of these effects changed with the pianists' coloringof their performances and with the pitch contour. Theseregularities were modeled in a set of rules for articulation inautomatic piano music performance.

    Emotional coloring of performances was realized by means ofmacro-rules implemented in the Director Musices performancesystem. These macro-rules are groups of rules that werecombined such that they reflected previous observations onmusical expression of specific emotions. Six emotions weresimulated. A listening test revealed that listeners were ableto recognize the intended emotional colorings.

    In addition, some possible future applications are discussedin the fields of automatic music performance, music education,automatic music analysis, virtual reality and soundsynthesis.

  • 27.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Battel, Giovanni Umberto
    Articulation strategies in expressive piano performance - Analysis of legato, staccato, and repeated notes in performances of the Andante movement of Mozart's Sonata in G major (K 545)2000Ingår i: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, Vol. 29, nr 3, s. 211-224Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Articulation strategies applied by pianists in expressive performances of the same core are analysed. Measurements of key overlap time and its relation to the inter-onset-interval are collected for notes marked legato and staccato in the first sixteen bars of the Andante movement of W.A. Mozart's Piano Sonata in G major, K 545. Five pianists played the piece nine times. First, they played in a wa that they considered "optimal". In the remaining eight performances they were asked to represent different expressive characters, as specified in terms of different adjectives. Legato,staccato, and repeated notes articulation applied by the right hand were examined by means of statistical analysis. Although the results varied considerably between pianists, some trends could be observed. The pianists generally used similar strategies in the rendering intended to represent different expressive characters. legato was played with a key overlap ratio that depended on the inter-onset-interval (IOI). Staccato tones had approximate duration of 40% of the IOI. Repeated notes were played with a duration of about 60% of the IOI. The results seem useful as a basis for articulation rules in grammars for automatic piano performance.

  • 28.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    A multimedia environment for interactive music performance1997Ingår i: TMH-QPSR, Vol. 38, nr 2-3, s. 029-032Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    We propose a music performance tool based on the Java programming language. This software runs in any Java applet viewer (i.e. a WWW browser) and interacts with the local Midi equipment by mean of a multi-task software module for Midi applications (MidiShare). Two main ideas are at the base of our project: one is to realise an easy, intuitive, hardware and software independent tool for performance, and the other is to achieve an easier development of the tool itself. At the moment there are two projects under development: a system based only on a Java applet, called Japer (Java performer), and a hybrid system based on a Java user interface and a Lisp kernel for the development of the performance tools. In this paper, the first of the two projects is presented.

  • 29.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel. KTH, Tidigare Institutioner (före 2005), Talöverföring och musikakustik.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    A multimedia environment for interactive music performance1997Ingår i: Proceedings of KANSEI - The Technology of Emotion, AIMI International Workshop, 1997, s. 64-67Konferensbidrag (Refereegranskat)
  • 30.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Emotional coloring of computer controlled music performance2000Ingår i: Computer music journal, ISSN 0148-9267, E-ISSN 1531-5169, Vol. 24, nr 4, s. 44-63Artikel i tidskrift (Refereegranskat)
  • 31.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Emotional coloring of computer controlled music performance2000Ingår i: Computer music journal, ISSN 0148-9267, E-ISSN 1531-5169, Vol. 24, nr 4, s. 44-61Artikel i tidskrift (Refereegranskat)
  • 32.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Expressive musical icons2001Ingår i: Proceedings of the International Conference on Auditory Display - ICAD 2001, 2001, s. 141-143Konferensbidrag (Refereegranskat)
    Abstract [en]

    Recent research on the analysis and synthesis of music performance has resulted in tools for the control of the expressive content in automatic music performance [1]. These results can be relevant for applications other than performance of music by a computer. In this work it is presented how the techniques for enhancing the expressive character in music performance can be used also in the design of sound logos, in the control of synthesis algorithms, and for achieving better ringing tones in mobile phones. 

  • 33.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Rule-based emotional colouring of music performance2000Ingår i: Proceedings of the International Computer Music Conference - ICMC 2000 / [ed] Zannos, I., San Francisco: ICMA , 2000, s. 364-367Konferensbidrag (Refereegranskat)
  • 34.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Software tools for musical expression.2000Ingår i: Proceedings of the InternationalComputer Music Conference 2000 / [ed] Zannos, Ioannis, San Francisco, USA: Computer Music Association , 2000, s. 499-502Konferensbidrag (Refereegranskat)
  • 35.
    Bresin, Roberto
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Dahl, Sofia
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Toward a new model for sound control2001Ingår i: Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-01), Limerick, Ireland, December 6-8, 200 / [ed] Fernström, M., Brazil, E., & Marshall, M., 2001, s. 45-49Konferensbidrag (Refereegranskat)
    Abstract [en]

    The control of sound synthesis is a well-known problem. This is particularly true if the sounds are generated with physical modeling techniques that typically need specification of numerous control parameters. In the present work outcomes from studies on automatic music performance are used for tackling this problem. 

  • 36.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Sundberg, Johan
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Director musices: The KTH performance rules system2002Ingår i: Proceedings of SIGMUS-46, Information Processing Society of Japan, , 2002, s. 43-48Konferensbidrag (Refereegranskat)
    Abstract [en]

    Director Musices is a program that transforms notated scores into musical performances. It implements the performance rules emerging from research projects at the Royal Institute of Technology (KTH). Rules in the program model performance aspects such as phrasing, articulation, and intonation, and they operate on performance variables such as tone, inter-onset duration, amplitude, and pitch. By manipulating rule parameters, the user can act as a metaperformer controlling different feature of the performance, leaving the technical execution to the computer. Different interpretations of the same piece can easily be obtained. Features of Director Musices include MIDI file input and output, rule palettes, graphical display of all performance variables (along with the notation), and userdefined performance rules. The program is implemented in Common Lisp and is available free as a stand-alone application both for Macintosh and Windows platforms. Further information, including music examples, publications, and the program itself, is located online at http://www.speech.kth.se/music/performance. This paper is a revised and updated version of a previous paper published in the Computer Music Journal in year 2000 that was mainly written by Anders Friberg (Friberg, Colombo, Frydén and Sundberg, 2000). 

  • 37.
    Bresin, Roberto
    et al.
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Widmer, Gerhard
    Production of staccato articulation in Mozart sonatas played on a grand piano.: Preliminary results2000Ingår i: Speech Music and Hearing Quarterly Progress and Status Report, ISSN 1104-5787, Vol. 41, nr 4, s. 001-006Artikel i tidskrift (Refereegranskat)
  • 38.
    Burger, Birgitta
    et al.
    University of Cologne, Dept. of Systematic Musicology, Germany.
    Bresin, Roberto
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Displaying expression in musical performance by means of a mobile robot2007Ingår i: Affective Computing And Intelligent Interaction, Proceedings, 2007, Vol. 4738, s. 753-754Konferensbidrag (Refereegranskat)
  • 39. Canazza, S.
    et al.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Rodà, A.
    Zanon, P.
    Expressive Director: a system for the real-time control of music performance synthesis2003Ingår i: Proc of SMAC 2003, Stockholm Music Acoustics Conference / [ed] R. Bresin, 2003, Vol. 2, s. 521-524Konferensbidrag (Refereegranskat)
    Abstract [en]

    The Expressive Director is a system allowing real-time control of music performance synthesis, in particular regarding expressive and emotional aspects. It allows a user to interact in real time, for example, changing the emotional intent from happy to sad or from a romantic expressive style to a neutral while it is playing. The Expressive Director was designed in order to merge the expressiveness model developed at CSC and at KTH. The control of the synthesis can be obtained using a two-dimensional space (called “Control Space”) in which the mouse pointer can be moved by the user from an expressive intention to another continuously. Depending on the position, the system applies suitable expressive deviations profiles. The Control Space can be made so as to represent the Valence-Arousal space from music psychology research.

  • 40.
    Carlson, Rolf
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Frydén, Lars
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Sundberg, Johan
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Speech and music performance. Parallels and contrasts1987Ingår i: STL-QPSR, Vol. 28, nr 4, s. 007-023Artikel i tidskrift (Övrigt vetenskapligt)
  • 41.
    Carlson, Rolf
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Frydén, Lars
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Sundberg, Johan
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Speech and music performance: parallels and contrasts1989Ingår i: Contemporary Music Review, ISSN 0749-4467, E-ISSN 1477-2256, Vol. 4, s. 389-402Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Speech and music performance are two important systems for interhuman communication by means of acoustic signals. These signals must be adapted to the human perceptual and cognitive systems. Hence a comparitive analysis of speech and music performances is likely to shed light on these systems, particularly regarding basic requirements for acoustic communication. Two computer programs are compared, one for text-to-speech conversion and one for note-to-tone conversion. Similarities are found in the need for placing emphasis on unexpected elements, for increasing the dissimilarities between different categories, and for flagging structural constituents. Similarities are also found in the code chosen for conveying this information, e.g. emphasis by lengthening and constituent marking by final lengthening. 

  • 42. Cohen,, Michael M.
    et al.
    Beskow, Jonas
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Massaro, Dominic W.
    RECENT DEVELOPMENTS IN FACIAL ANIMATION: AN INSIDE VIEW1998Ingår i: Proceedings of International Conference on Auditory-Visual Speech Processing / [ed] Burnham, D., Robert-Ribes, J. & Vatikiotis-Bateson, E., 1998, s. 201-206Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We report on our recent facial animation work to improve the realism and accuracy of visual speech synthesis. The general approach is to use both staticand dynamic observations of natural speech to guidethe facial modeling. One current goal is to model the internal articulators of a highly realistic palate, teeth, and an improved tongue. Because our talkinghead can be made transparent, we can provide ananatomically valid and pedagogically useful displaythat can be used in speech training of children withhearing loss [1]. High-resolution models of palateand teeth [2] were reduced to a relatively smallnumber of polygons for real-time animation [3]. Forthe improved tongue, we are using 3D ultrasound data and electropalatography (EPG) [4] with errorminimization algorithms to educate our parametricB-spline based tongue model to simulate realisticspeech. In addition, a high-speed algorithm has beendeveloped for detection and correction of collisions, to prevent the tongue from protruding through the palate and teeth, and to enable the real-time displayof synthetic EPG patterns.

  • 43.
    Dahl, S.
    et al.
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    What can the body movements reveal about a musician’s emotional intention?2003Ingår i: Proc of SMAC 03, Stockholm Music Acoustics Conference, 2003, Vol. 2, s. 599-602Konferensbidrag (Refereegranskat)
    Abstract [en]

    Music has an intimate relationship with motion in several aspects. Obviously, movements are required to play an instrument but musicians move also their bodies in a way not directly related to note production. In order to explore to what extent emotional intentions can be conveyed through musicians’ movements only, video recordings of a marimba player performing the same piece with the intentions Happy, Sad, Angry and Fearful, were recorded. 20 observers watched the video clips, without sound, and rated both the perceived emotional content as well as movement cues. The videos were presented in four viewing conditions, showing different parts of the player. The observers’ ratings for the intended emotions showed that the intentions Happiness, Sadness and Anger were well communicated, while Fear was not. The identification of the intended emotion was only slightly influenced by the viewing condition, although in some cases the head was important. The movement ratings indicate that there are cues that the observer use to distinguish between intentions, similar to the cues found for audio signals in music performance. Anger was characterized by large, fast, uneven, and jerky movements; Happy by large and somewhat fast movements, Sadness by small, slow, even and smooth movements.

  • 44.
    Dahl, Sofia
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Playing the accent: comparing striking velocity and timing in an ostinato rhythm performed by four drummers2004Ingår i: Acta Acoustica united with Acustica, ISSN 1610-1928, E-ISSN 1861-9959, Vol. 90, nr 4, s. 762-776Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Four percussion players’ strategies for performing an accented stroke were studied by capturing movement trajectories.The players played on a force plate with markers on the drumstick, hand, and lower and upper arm. Therhythmic pattern – an ostinato with interleaved accents every fourth stroke – was performed at different dynamiclevels, tempi and on different striking surfaces attached to the force plate. The analysis displayed differencesbetween the movement trajectories for the four players, which were maintained consistently during all playingconditions. The characteristics of the players’ individual movement patterns were observed to correspond wellwith the striking velocities and timing in performance. The most influential parameter on the movement patternswas the dynamic level with increasing preparatory heights and striking velocity for increasing dynamic level. Theinterval beginning with the accented stroke was prolonged, the amount of lengthening decreasing with increasingdynamic level.

  • 45.
    Dahl, Sofia
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    The playing of an accent: Preliminary observations from temporal and kinematic analysis of percussionists2000Ingår i: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, Vol. 29, nr 3, s. 225-233Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The movements and timing when playing an interleaved accent in drumming were studied for three professionals and one amateur. The movement analysis showed that the subjects prepared for the accented stroke by raising the drumstick up to a greater height. The movement strategies used, however, differed widely in appearance.

    The timing analysis showed two basic features, a slow change in tempo over a longer time span ("drift"), and a short ter variation between adjacent intervals ("flutter"). Cyclic patterns, with every fourth interval prolonged, could be seen in the flutter. The lengthening of the interval, beginning with the accented stroke, seems to be a common way for the player to give the accent more emphasis. A listening test was performed to investigate if these cyclic patterns conveyed information to a listener about the grouping of the strokes. Listeners identified sequences where the magnitude of the inter-onset interval fluctuations were large during the cyclic patterns.

  • 46. Dahl, Sofia
    et al.
    Friberg, Anders
    KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
    Expressiveness of a marimba player’s body movements2004Ingår i: TMH-QPSR, ISSN 1104-5787, TMH-QPSR, Vol. 46, nr 1, s. 075-086Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    Musicians often make gestures and move their bodies expressing their musical intention. This visual information provides a separate channel of communication to the listener. In order to explore to what extent emotional intentions can be conveyed through musicians’ movements, video recordings were made of a marimba player performing the same piece with four different intentions, Happy, Sad, Angry and Fearful. Twenty subjects were asked to rate the silent video clips with respect to perceived emotional content and movement qualities. The video clips were presented in different viewing conditions, showing different parts of the player. The results showed that the intentions Happiness, Sadness and Anger were well communicated, while Fear was not. The identification of the intended emotion was only slightly influenced by viewing condition. The movement ratings indicated that there were cues that the observers used to distinguish between intentions, similar to cues found for audio signals in music performance.

  • 47.
    Edlund, Jens
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Skantze, Gabriel
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Carlson, Rolf
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Higgins: a spoken dialogue system for investigating error handling techniques2004Ingår i: Proceedings of the International Conference on Spoken Language Processing, ICSLP 04, 2004, s. 229-231Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, an overview of the Higgins project and the research within the project is presented. The project incorporates studies of error handling for spoken dialogue systems on several levels, from processing to dialogue level. A domain in which a range of different error types can be studied has been chosen: pedestrian navigation and guiding. Several data collections within Higgins have been analysed along with data from Higgins' predecessor, the AdApt system. The error handling research issues in the project are presented in light of these analyses.

  • 48.
    Engwall, Olov
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Are Static MRI Data Representative of Dynamic Speech?: Results from a Comparative Study Using MRI, EMA, and EPG2000Ingår i: Proceedings of the 6th ICSLP, 2000, s. 17-20Konferensbidrag (Övrigt vetenskapligt)
  • 49.
    Engwall, Olov
    KTH, Tidigare Institutioner                               , Tal, musik och hörsel.
    Combining MRI, EMA and EPG measurements in a three-dimensional tongue model2003Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 41, nr 2-3, s. 303-329Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A three-dimensional (3D) tongue model has been developed using MR images of a reference subject producing 44 artificially sustained Swedish articulations. Based on the difference in tongue shape between the articulations and a reference, the six linear parameters jaw height, tongue body, tongue dorsum, tongue tip, tongue advance and tongue width were determined using an ordered linear factor analysis controlled by articulatory measures. The first five factors explained 88% of the tongue data variance in the midsagittal plane and 78% in the 3D analysis. The six-parameter model is able to reconstruct the modelled articulations with an overall mean reconstruction error of 0.13 cm, and it specifically handles lateral differences and asymmetries in tongue shape. In order to correct articulations that were hyperarticulated due to the artificial sustaining in the magnetic resonance imaging (MRI) acquisition, the parameter values in the tongue model were readjusted based on a comparison of virtual and natural linguopalatal contact patterns, collected with electropalatography (EPG). Electromagnetic articulography (EMA) data was collected to control the kinematics of the tongue model for vowel-fricative sequences and an algorithm to handle surface contacts has been implemented, preventing the tongue from protruding through the palate and teeth.

  • 50.
    Engwall, Olov
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Concatenative Articulatory SynthesisManuskript (preprint) (Övrigt vetenskapligt)
12345 1 - 50 av 201
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf