Ändra sökning
Avgränsa sökresultatet
1234567 51 - 100 av 1064
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 51.
    Alexanderson, Simon
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    O'Sullivan, Carol
    Neff, Michael
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Mimebot—Investigating the Expressibility of Non-Verbal Communication Across Agent Embodiments2017Ingår i: ACM Transactions on Applied Perception, ISSN 1544-3558, E-ISSN 1544-3965, Vol. 14, nr 4, artikel-id 24Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Unlike their human counterparts, artificial agents such as robots and game characters may be deployed with a large variety of face and body configurations. Some have articulated bodies but lack facial features, and others may be talking heads ending at the neck. Generally, they have many fewer degrees of freedom than humans through which they must express themselves, and there will inevitably be a filtering effect when mapping human motion onto the agent. In this article, we investigate filtering effects on three types of embodiments: (a) an agent with a body but no facial features, (b) an agent with a head only, and (c) an agent with a body and a face. We performed a full performance capture of a mime actor enacting short interactions varying the non-verbal expression along five dimensions (e.g., level of frustration and level of certainty) for each of the three embodiments. We performed a crowd-sourced evaluation experiment comparing the video of the actor to the video of an animated robot for the different embodiments and dimensions. Our findings suggest that the face is especially important to pinpoint emotional reactions but is also most volatile to filtering effects. The body motion, on the other hand, had more diverse interpretations but tended to preserve the interpretation after mapping and thus proved to be more resilient to filtering.

  • 52. Alku, Paavo
    et al.
    Airas, Matti
    Björkner, Eva
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    An amplitude quotient based method to analyze changes in the shape of the glottal pulse in the regulation of vocal intensity2006Ingår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 120, nr 2, s. 1052-1062Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This study presents an approach to visualizing intensity regulation in speech. The method expresses a voice sample in a two-dimensional space using amplitude-domain values extracted from the glottal flow estimated by inverse filtering. The two-dimensional presentation is obtained by expressing a time-domainmeasure of the glottal pulse, the amplitude quotient (AQ), as a function of the negative peak amplitude of the flow derivative (d(peak)). The regulation of vocal intensity was analyzed with the proposed method from voices varying from extremely soft to very loud with a SPL range of approximately 55 dB. When vocal intensity was increased, the speech samples first showed a rapidly decreasing trend as expressed on the proposed AQ-d(peak) graph. When intensity was further raised, the location of the samples converged toward a horizontal line, the asymptote of a hypothetical hyperbola. This behavior of the AQ-d(peak) graph indicates that the intensity regulation strategy changes from laryngeal to respiratory mechanisms and the method chosen makes it possible to quantify how control mechanisms underlying the regulation of vocal intensity change gradually between the two means. The proposed presentation constitutes an easy-to-implement method to visualize the function of voice production in intensity regulation because the only information needed is the glottal flow wave form estimated by inverse filtering the acoustic speech pressure signal.

  • 53. Allwood, Jens
    et al.
    Cerrato, Loredana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Jokinen, Kristiina
    Navarretta, Costanza
    Paggio, Patrizia
    The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena2007Ingår i: Language resources and evaluation, ISSN 1574-020X, E-ISSN 1574-0218, Vol. 41, nr 3-4, s. 273-287Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper deals with a multimodal annotation scheme dedicated to the study of gestures in interpersonal communication, with particular regard to the role played by multimodal expressions for feedback, turn management and sequencing. The scheme has been developed under the framework of the MUMIN network and tested on the analysis of multimodal behaviour in short video clips in Swedish, Finnish and Danish. The preliminary results obtained in these studies show that the reliability of the categories defined in the scheme is acceptable, and that the scheme as a whole constitutes a versatile analysis tool for the study of multimodal communication behaviour.

  • 54. Altmann, U.
    et al.
    Oertel, Catharine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Campbell, N.
    Conversational Involvement and Synchronous Nonverbal Behaviour2012Ingår i: Cognitive Behavioural Systems: COST 2102 International Training School, Dresden, Germany, February 21-26, 2011, Revised Selected Papers / [ed] Anna Esposito, Antonietta M. Esposito, Alessandro Vinciarelli, Rüdiger Hoffmann, Vincent C. Müller, Springer Berlin/Heidelberg, 2012, s. 343-352Konferensbidrag (Refereegranskat)
    Abstract [en]

    Measuring the quality of an interaction by means of low-level cues has been the topic of many studies in the last couple of years. In this study we propose a novel method for conversation-quality-assessment. We first test whether manual ratings of conversational involvement and automatic estimation of synchronisation of facial activity are correlated. We hypothesise that the higher the synchrony the higher the involvement. We compare two different synchronisation measures. The first measure is defined as the similarity of facial activity at a given point in time. The second is based on dependence analyses between the facial activity time series of two interlocutors. We found that dependence measure correlates more with conversational involvement than similarity measure.

  • 55. Ambrazaitis, G.
    et al.
    Svensson Lundmark, M.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Head beats and eyebrow movements as a function of phonological prominence levels and word accents in Stockholm Swedish news broadcasts2015Ingår i: The 3rd European Symposium on Multimodal Communication, Dublin, Ireland, 2015Konferensbidrag (Refereegranskat)
  • 56. Ambrazaitis, G.
    et al.
    Svensson Lundmark, M.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Head Movements, Eyebrows, and Phonological Prosodic Prominence Levels in Stockholm2015Ingår i: 13th International Conference on Auditory-Visual Speech Processing (AVSP 2015), Vienna, Austria, 2015, s. 42-Konferensbidrag (Refereegranskat)
  • 57. Ambrazaitis, G.
    et al.
    Svensson Lundmark, M.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Multimodal levels of promincence: a preliminary analysis of head and eyebrow movements in Swedish news broadcasts2015Ingår i: Proceedings of Fonetik 2015 / [ed] Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J., Lund, 2015, s. 11-16Konferensbidrag (Övrigt vetenskapligt)
  • 58.
    Ananthakrishnan, Gopal
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    From Acoustics to Articulation: Study of the acoustic-articulatory relationship along with methods to normalize and adapt to variations in production across different speakers2011Doktorsavhandling, monografi (Övrigt vetenskapligt)
    Abstract [en]

    The focus of this thesis is the relationship between the articulation ofspeech and the acoustics of produced speech. There are several problems thatare encountered in understanding this relationship, given the non-linearity,variance and non-uniqueness in the mapping, as well as the differences thatexist in the size and shape of the articulators, and consequently the acoustics,for different speakers. The thesis covers mainly four topics pertaining to thearticulation and acoustics of speech.The first part of the thesis deals with variations among different speakersin the articulation of phonemes. While the speakers differ physically in theshape of their articulators and vocal tracts, the study tries to extract articula-tion strategies that are common to different speakers. Using multi-way linearanalysis methods, the study extracts articulatory parameters which can beused to estimate unknown articulations of phonemes made by one speaker;knowing other articulations made by the same speaker and those unknown ar-ticulations made by other speakers of the language. At the same time, a novelmethod to select the number of articulatory model parameters, as well as thearticulations that are representative of a speaker’s articulatory repertoire, issuggested.The second part is devoted to the study of uncertainty in the acoustic-to-articulatory mapping, specifically non-uniqueness in the mapping. Severalstudies in the past have shown that human beings are capable of producing agiven phoneme using non-unique articulatory configurations, when the artic-ulators are constrained. This was also demonstrated by synthesizing soundsusing theoretical articulatory models. The studies in this part of the the-sis investigate the existence of non-uniqueness in unconstrained read speech.This is carried out using a database of acoustic signals recorded synchronouslyalong with the positions of electromagnetic coils placed on selected points onthe lips, jaws, tongue and velum. This part, thus, largely devotes itself todescribing techniques that can be used to study non-uniqueness in the sta-tistical sense, using such a database. The results indicate that the acousticvectors corresponding to some frames in all the phonemes in the databasecan be mapped onto non-unique articulatory distributions. The predictabil-ity of these non-unique frames is investigated, along with verifying whetherapplying continuity constraints can resolve this non-uniqueness.The third part proposes several novel methods of looking at acoustic-articulatory relationships in the context of acoustic-to-articulatory inversion.The proposed methods include explicit modeling of non-uniqueness usingcross-modal Gaussian mixture modeling, as well as modeling the mappingas local regressions. Another innovative approach towards the mapping prob-lem has also been described in the form of relating articulatory and acousticgestures. Definitions and methods to obtain such gestures are presented alongwith an analysis of the gestures for different phoneme types. The relationshipbetween the acoustic and articulatory gestures is also outlined. A method toconduct acoustic-to-articulatory inverse mapping is also suggested, along withva method to evaluate it. An application of acoustic-to-articulatory inversionto improve speech recognition is also described in this part of the thesis.The final part of the thesis deals with problems related to modeling infantsacquiring the ability to speak; the model utilizing an articulatory synthesizeradapted to infant vocal tract sizes. The main problem addressed is related tomodeling how infants acquire acoustic correlates that are normalized betweeninfants and adults. A second problem of how infants decipher the number ofdegrees of articulatory freedom is also partially addressed. The main contri-bution is a realistic model which shows how an infant can learn the mappingbetween the acoustics produced during the babbling phase and the acous-tics heard from the adults. The knowledge required to map correspondingadult-infant speech sounds is shown to be learnt without the total numberof categories or one-one correspondences being specified explicitly. Instead,the model learns these features indirectly based on an overall approval rating,provided by a simulation of adult perception, on the basis of the imitation ofadult utterances by the infant model.Thus, the thesis tries to cover different aspects of the relationship betweenarticulation and acoustics of speech in the context of variations for differentspeakers and ages. Although not providing complete solutions, the thesis pro-poses novel directions for approaching the problem, with pointers to solutionsin some contexts.

  • 59.
    Ananthakrishnan, Gopal
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Imitating Adult Speech: An Infant's Motivation2011Ingår i: 9th International Seminar on Speech Production, 2011, s. 361-368Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper tries to detail two aspects of speech acquisition by infants which are often assumed to be intrinsic or innate knowledge, namely number of degrees of freedom in the articulatory parameters and the acoustic correlates that find the correspondence between adult speech and the speech produced by the infant. The paper shows that being able to distinguish the different vowels in the vowel space of the certain language is a strong motivation for choosing both a certain number of independent articulatory parameters as well as a certain scheme of acoustic normalization between adult and child speech.

  • 60.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Badin, P.
    GIPSA-Lab, Grenoble University.
    Vargas, J. A. V.
    GIPSA-Lab, Grenoble University.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Predicting Unseen Articulations from Multi-speaker Articulatory Models2010Ingår i: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Japan, 2010, s. 1588-1591Konferensbidrag (Refereegranskat)
    Abstract [en]

    In order to study inter-speaker variability, this work aims to assessthe generalization capabilities of data-based multi-speakerarticulatory models. We use various three-mode factor analysistechniques to model the variations of midsagittal vocal tractcontours obtained from MRI images for three French speakersarticulating 73 vowels and consonants. Articulations of agiven speaker for phonemes not present in the training set arethen predicted by inversion of the models from measurementsof these phonemes articulated by the other subjects. On the average,the prediction RMSE was 5.25 mm for tongue contours,and 3.3 mm for 2D midsagittal vocal tract distances. Besides,this study has established a methodology to determine the optimalnumber of factors for such models.

  • 61.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Eklund, Robert
    Voice Provider, Stockholm.
    Peters, Gustav
    Forschungsinstitut Alexander Koenig, Bonn, Germany.
    Mabiza, Evans
    Antelope Park, Gweru, Zimbabwe.
    An acoustic analysis of lion roars. II: Vocal tract characteristics2011Ingår i: Proceedings from Fonetik 2011: Speech, Music and Hearing Quarterly Progress and Status Report, TMH-QPSR, Volume 51, 2011, Stockholm: KTH Royal Institute of Technology, 2011, Vol. 51, nr 1, s. 5-8Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper makes the first attempt to perform an acoustic-to-articulatory inversion of a lion (Panthera leo) roar. The main problems that one encounters in attempting this, is the fact that little is known about the dimensions of the vocal tract, other than a general range of vocal tract lengths. Precious little is also known about the articulation strategies that are adopted by the lion while roaring. The approach used here is to iterate between possible values of vocal tract lengths and vocal tractconfigurations. Since there seems to be a distinct articulatory changes during the process of a roar, we find a smooth path that minimizes the error function between arecorded roar and the simulated roar using a variable length articulatory model.

  • 62.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Important regions in the articulator trajectory2008Ingår i: Proceedings of International Seminar on Speech Production / [ed] Rudolph Sock, Susanne Fuchs, Yves Laprie, Strasbourg, France: INRIA , 2008, s. 305-308Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper deals with identifying important regions in the articulatory trajectory based on the physical properties of the trajectory. A method to locate critical time instants as well as the key articulator positions is suggested. Acoustic-to-Articulatory Inversion using linear and non-linear regression isperformed using only these critical points. The accuracy of inversion is found to be almost the same as using all the data points.

  • 63.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Mapping between acoustic and articulatory gestures2011Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 53, nr 4, s. 567-589Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper proposes a definition for articulatory as well as acoustic gestures along with a method to segment the measured articulatory trajectories and acoustic waveforms into gestures. Using a simultaneously recorded acoustic-articulatory database, the gestures are detected based on finding critical points in the utterance, both in the acoustic and articulatory representations. The acoustic gestures are parameterized using 2-D cepstral coefficients. The articulatory trajectories arc essentially the horizontal and vertical movements of Electromagnetic Articulography (EMA) coils placed on the tongue, jaw and lips along the midsagittal plane. The articulatory movements are parameterized using 2D-DCT using the same transformation that is applied on the acoustics. The relationship between the detected acoustic and articulatory gestures in terms of the timing as well as the shape is studied. In order to study this relationship further, acoustic-to-articulatory inversion is performed using GMM-based regression. The accuracy of predicting the articulatory trajectories from the acoustic waveforms are at par with state-of-the-art frame-based methods with dynamical constraints (with an average error of 1.45-1.55 mm for the two speakers in the database). In order to evaluate the acoustic-to-articulatory inversion in a more intuitive manner, a method based on the error in estimated critical points is suggested. Using this method, it was noted that the estimated articulatory trajectories using the acoustic-to-articulatory inversion methods were still not accurate enough to be within the perceptual tolerance of audio-visual asynchrony.

  • 64.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Resolving Non-uniqueness in the Acoustic-to-Articulatory Mapping2011Ingår i: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Prague, Czech republic, 2011, s. 4628-4631Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper studies the role of non-uniqueness in the Acoustic-to- Articulatory Inversion. It is generally believed that applying continuity constraints to the estimates of thearticulatory parameters can resolve the problem of non-uniqueness. This paper tries to find out whether all instances of non-uniqueness can be resolved using continuity constraints. The investigation reveals that applying continuity constraints provides the best estimate in roughly around 50 to 53 % of the non-unique mappings. Roughly around 8 to13 % of the non-unique mappings are best estimated by choosing discontinuous paths along the hypothetical high probability estimates of articulatory trajectories.

  • 65.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Neiberg, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Exploring the Predictability of Non-Unique Acoustic-to-Articulatory Mappings2012Ingår i: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 20, nr 10, s. 2672-2682Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper explores statistical tools that help analyze the predictability in the acoustic-to-articulatory inversion of speech, using an Electromagnetic Articulography database of simultaneously recorded acoustic and articulatory data. Since it has been shown that speech acoustics can be mapped to non-unique articulatory modes, the variance of the articulatory parameters is not sufficient to understand the predictability of the inverse mapping. We, therefore, estimate an upper bound to the conditional entropy of the articulatory distribution. This provides a probabilistic estimate of the range of articulatory values (either over a continuum or over discrete non-unique regions) for a given acoustic vector in the database. The analysis is performed for different British/Scottish English consonants with respect to which articulators (lips, jaws or the tongue) are important for producing the phoneme. The paper shows that acoustic-articulatory mappings for the important articulators have a low upper bound on the entropy, but can still have discrete non-unique configurations.

  • 66.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Neiberg, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Cross-modal Clustering in the Acoustic-Articulatory Space2009Ingår i: Proceedings Fonetik 2009: The XXIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 202-207Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    This paper explores cross-modal clustering in the acoustic-articulatory space. A method to improve clustering using information from more than one modality is presented. Formants and the Electromagnetic Articulography meas-urements are used to study corresponding clus-ters formed in the two modalities. A measure for estimating the uncertainty in correspon-dences between one cluster in the acoustic space and several clusters in the articulatory space is suggested.

  • 67.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Neiberg, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    In search of Non-uniqueness in the Acoustic-to-Articulatory Mapping2009Ingår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 2799-2802Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper explores the possibility and extent of non-uniqueness in the acoustic-to-articulatory inversion of speech, from a statistical point of view. It proposes a technique to estimate the non-uniqueness, based on finding peaks in the conditional probability function of the articulatory space. The paper corroborates the existence of non-uniqueness in a statistical sense, especially in stop consonants, nasals and fricatives. The relationship between the importance of the articulator position and non-uniqueness at each instance is also explored.

  • 68.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Salvi, Giampiero
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Using Imitation to learn Infant-Adult Acoustic Mappings2011Ingår i: 12th Annual Conference Of The International Speech Communication Association 2011 (INTERSPEECH 2011), Vols 1-5, ISCA , 2011, s. 772-775Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper discusses a model which conceptually demonstrates how infants could learn the normalization between infant-adult acoustics. The model proposes that the mapping can be inferred from the topological correspondences between the adult and infant acoustic spaces, that are clustered separately in an unsupervised manner. The model requires feedback from the adult in order to select the right topology for clustering, which is a crucial aspect of the model. The feedback Is in terms of an overall rating of the imitation effort by the infant, rather than a frame-by-frame correspondence. Using synthetic, but continuous speech data, we demonstrate that clusters, which have a good topological correspondence, are perceived to be similar by a phonetically trained listener.

  • 69.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Detecting confusable phoneme pairs for Swedish language learners depending on their first language2011Ingår i: TMH-QPSR, ISSN 1104-5787, Vol. 51, nr 1, s. 89-92Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    This paper proposes a paradigm where commonly made segmental pronunciation errors are modeled as pair-wise confusions between two or more phonemes in the language that is being learnt. The method uses an ensemble of support vector machine classifiers with time varying Mel frequency cepstral features to distinguish between several pairs of phonemes. These classifiers are then applied to classify the phonemes uttered by second language learners. Using this method, an assessment is made regarding the typical pronunciation problems that students learning Swedish would encounter, depending on their first language.

  • 70.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Abdou, Sherif
    Faculty of Computers & Information, Cairo University, Egypt.
    Using an Ensemble of Classifiers for Mispronunciation Feedback2011Ingår i: Proceedings of SLaTE / [ed] Strik, H.; Delmonte, R.; Russel, M., Venice, Italy, 2011Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper proposes a paradigm where commonly made segmental pronunciation errors are modeled as pair-wise confusions between two or more phonemes in the language that is being learnt. The method uses an ensemble of support vector machine classifiers with time varying Mel frequency cepstral features to distinguish between several pairs of phonemes. These classifiers are then applied to classify the phonemes uttered by second language learners. Instead of providing feedback at every mispronounced phoneme, the method attempts toprovide feedback about typical mispronunciations by a certain student, over an entire session of several utterances. Two case studies that demonstrate how the paradigm is applied to provide suitable feedback to two students is also described in this pape

  • 71.
    Arango-Alegría, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Cuadernillo sobre América Central2005 (uppl. 1)Bok (Refereegranskat)
  • 72.
    Arango-Alegría, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Cuadernillo sobre México2005Bok (Refereegranskat)
  • 73. Arlinger, S.
    et al.
    Uhlén, I.
    Hagerman, B.
    Kähäri, K.
    Rosenhall, U.
    Spens, Karl Erik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Holgers, K. -M
    Höga ljudnivåer på konserter kan ge hörselskador för livet: Musikbranschen tar inte sitt ansvar2007Ingår i: Läkartidningen, ISSN 0023-7205, E-ISSN 1652-7518, Vol. 104, nr 41, s. 2978-2979Artikel i tidskrift (Refereegranskat)
  • 74. Arlinger, Stig
    et al.
    Nordqvist, Peter
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Öberg, Marie
    International Outcome Inventory for Hearing Aids: Data From a Large Swedish Quality Register Database2017Ingår i: American Journal of Audiology, ISSN 1059-0889, E-ISSN 1558-9137, Vol. 26, nr 3, s. 443-450Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Purpose: The purpose of this study was to analyze a database of completed International Outcome Inventory for Hearing Aids (IOI-HA) questionnaires obtained from over 100,000 clients fitted with new hearing aids in Sweden during the period of 2012-2016. Mean IOI-HA total scores were correlated with degree of hearing loss, unilateral versus bilateral fitting, first-time versus return clients, gender, and variation among dispensing clinics. The correlations with expectations, service quality, and technical functioning of the hearing aids were also analyzed. Method: Questionnaires containing the 7 IOI-HA items as well as questions concerning some additional issues were mailed to clients 3-6 months after fitting of new hearing aids. The questionnaires were returned to and analyzed by an independent research institute. Results: More than 100 dispensing clinics nationwide take part in this project. A response rate of 52.6% resulted in 106,631 data sets after excluding incomplete questionnaires. Forty-six percent of the responders were women, and 54% were men. The largest difference in mean score (0.66) was found for the IOI-HA item "use" between return clients and first-time users. Women reported significantly higher (better) scores for the item "impact on others" compared with men. The bilaterally fitted subgroup reported significantly higher scores for all 7 items compared with the unilaterally fitted subgroup. Experienced users produced higher scores on benefit and satisfaction items, whereas first-time users gave higher scores for residual problems. No correlation was found between mean IOI-HA total score and average hearing threshold level (pure-tone average [ PTA]). Mean IOI-HA total scores were found to correlate significantly with perceived service quality of the dispensing center and with the technical functionality of the hearing aids. Conclusions: When comparing mean IOI-HA total scores from different studies or between groups, differences with regard to hearing aid experience, gender, and unilateral versus bilateral fitting have to be considered. No correlation was found between mean IOI-HA total score and degree of hearing loss in terms of PTA. Thus, PTA is not a reliable predictor of benefit and satisfaction of hearing aid provision as represented by the IOI-HA items. Identification of a specific lower fence in PTA for hearing aid candidacy is therefore to be avoided. Large differences were found in mean IOI-HA total scores related to different dispensing centers.

  • 75.
    Arnela, Marc
    et al.
    GTM–Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins 30, Barcelona, E-08022, Catalonia, Spain.
    Blandin, Rémi
    GIPSA-Lab, Unité Mixte de Recherche au Centre National de la Recherche Scientifique 5216, Grenoble Campus, St. Martin d'Heres, F-38402, France.
    Dabbaghchian, Saeed
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Guasch, Oriol
    GTM–Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins 30, Barcelona, E-08022, Catalonia, Spain.
    Alías, Francesc
    GTM–Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins 30, Barcelona, E-08022, Catalonia, Spain.
    Pelorson, Xavier
    GIPSA-Lab, Unité Mixte de Recherche au Centre National de la Recherche Scientifique 5216, Grenoble Campus, St. Martin d'Heres, F-38402, France.
    Van Hirtum, Annemie
    GIPSA-Lab, Unité Mixte de Recherche au Centre National de la Recherche Scientifique 5216, Grenoble Campus, St. Martin d'Heres, F-38402, France.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Influence of lips on the production of vowels based on finite element simulations and experiments2016Ingår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 139, nr 5, s. 2852-2859Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Three-dimensional (3-D) numerical approaches for voice production are currently being investigated and developed. Radiation losses produced when sound waves emanate from the mouth aperture are one of the key aspects to be modeled. When doing so, the lips are usually removed from the vocal tract geometry in order to impose a radiation impedance on a closed cross-section, which speeds up the numerical simulations compared to free-field radiation solutions. However, lips may play a significant role. In this work, the lips' effects on vowel sounds are investigated by using 3-D vocal tract geometries generated from magnetic resonance imaging. To this aim, two configurations for the vocal tract exit are considered: with lips and without lips. The acoustic behavior of each is analyzed and compared by means of time-domain finite element simulations that allow free-field wave propagation and experiments performed using 3-D-printed mechanical replicas. The results show that the lips should be included in order to correctly model vocal tract acoustics not only at high frequencies, as commonly accepted, but also in the low frequency range below 4 kHz, where plane wave propagation occurs.

  • 76. Arnela, Marc
    et al.
    Dabbaghchian, Saeed
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Blandin, Rémi
    Guasch, Oriol
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Hirtum, Annemie Van
    Pelorson, Xavier
    Influence of vocal tract geometry simplifications on the numerical simulation of vowel sounds2016Ingår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 140, nr 3, s. 1707-1718Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    For many years, the vocal tract shape has been approximated by one-dimensional (1D) area functions to study the production of voice. More recently, 3D approaches allow one to deal with the complex 3D vocal tract, although area-based 3D geometries of circular cross-section are still in use. However, little is known about the influence of performing such a simplification, and some alternatives may exist between these two extreme options. To this aim, several vocal tract geometry simplifications for vowels [ɑ], [i], and [u] are investigated in this work. Six cases are considered, consisting of realistic, elliptical, and circular cross-sections interpolated through a bent or straight midline. For frequencies below 4–5 kHz, the influence of bending and cross-sectional shape has been found weak, while above these values simplified bent vocal tracts with realistic cross-sections are necessary to correctly emulate higher-order mode propagation. To perform this study, the finite element method (FEM) has been used. FEM results have also been compared to a 3D multimodal method and to a classical 1D frequency domain model.

  • 77. Arnela, Marc
    et al.
    Dabbaghchian, Saeed
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Blandin, Rémi
    Guasch, Oriol
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Pelorson, Xavier
    Van Hirtum, Annemie
    Effects of vocal tract geometry simplifications on the numerical simulation of vowels2015Ingår i: PAN EUROPEAN VOICE CONFERENCE ABSTRACT BOOK: Proceedings e report 104, Firenze University Press, 2015, s. 177-Konferensbidrag (Övrigt vetenskapligt)
  • 78.
    Arnela, Marc
    et al.
    GTM Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Barcelona, Spain.
    Dabbaghchian, Saeed
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Guasch, Oriol
    GTM Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Barcelona, Spain.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    A semi-polar grid strategy for the three-dimensional finite element simulation of vowel-vowel sequences2017Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, The International Speech Communication Association (ISCA), 2017, Vol. 2017, s. 3477-3481Konferensbidrag (Refereegranskat)
    Abstract [en]

    Three-dimensional computational acoustic models need very detailed 3D vocal tract geometries to generate high quality sounds. Static geometries can be obtained from Magnetic Resonance Imaging (MRI), but it is not currently possible to capture dynamic MRI-based geometries with sufficient spatial and time resolution. One possible solution consists in interpolating between static geometries, but this is a complex task. We instead propose herein to use a semi-polar grid to extract 2D cross-sections from the static 3D geometries, and then interpolate them to obtain the vocal tract dynamics. Other approaches such as the adaptive grid have also been explored. In this method, cross-sections are defined perpendicular to the vocal tract midline, as typically done in 1D to obtain the vocal tract area functions. However, intersections between adjacent cross-sections may occur during the interpolation process, especially when the vocal tract midline quickly changes its orientation. In contrast, the semi-polar grid prevents these intersections because the plane orientations are fixed over time. Finite element simulations of static vowels are first conducted, showing that 3D acoustic wave propagation is not significantly altered when the semi-polar grid is used instead of the adaptive grid. The vowel-vowel sequence [ɑi] is finally simulated to demonstrate the method.

  • 79. Arnela, Marc
    et al.
    Dabbaghchian, Saeed
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Guasch, Oriol
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongsIngår i: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924Artikel i tidskrift (Refereegranskat)
  • 80. Arnela, Marc
    et al.
    Guasch, Oriol
    Dabbaghchian, Saeed
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    FINITE ELEMENT GENERATION OF VOWEL SOUNDS USING DYNAMIC COMPLEX THREE-DIMENSIONAL VOCAL TRACTS2016Ingår i: PROCEEDINGS OF THE 23RD INTERNATIONAL CONGRESS ON SOUND AND VIBRATION: FROM ANCIENT TO MODERN ACOUSTICS, INT INST ACOUSTICS & VIBRATION , 2016Konferensbidrag (Refereegranskat)
    Abstract [en]

    Three-dimensional (3D) numerical simulations of the vocal tract acoustics require very detailed vocal tract geometries in order to generate good quality vowel sounds. These geometries are typically obtained from Magnetic Resonance Imaging (MRI), from which a volumetric representation of the complex vocal tract shape is obtained. Static vowel sounds can then be generated using a finite element code, which simulates the propagation of acoustic waves through the vocal tract when a given train of glottal pulses is introduced at the glottal cross-section. A more challenging problem to solve is that of generating dynamic vowel sounds. On the one hand, the acoustic wave equation has to be solved in a computational domain with moving boundaries, which entails some numerical difficulties. On the other hand, the finite element meshes where acoustic wave propagation is computed have to move according to the dynamics of these very complex vocal tract shapes. In this work this problem is addressed. First, the acoustic wave equation in mixed form is expressed in an Arbitrary Lagrangian-Eulerian (ALE) framework to account for the vocal tract wall motion. This equation is numerically solved using a stabilized finite element approach. Second, the dynamic 3D vocal tract geometry is approximated by a finite set of cross-sections with complex shape. The time-evolution of these cross-sections is used to move the boundary nodes of the finite element meshes, while inner nodes are computed through diffusion. Some dynamic vowel sounds are presented as numerical examples.

  • 81. Aronsson, Carina
    et al.
    Bohman, Mikael
    Ternström, Sten
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Södersten, Maria
    Loud voice during environmental noise exposure in patients with vocal nodules2007Ingår i: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 32, nr 2, s. 60-70Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The aim was to investigate how female patients with vocal nodules use their voices when trying to make themselves heard over background noise. Ten patients with bilateral vocal fold nodules and 23 female controls were recorded reading a text in four conditions, one without noise and three with noise from cafes/pubs, played over loudspeakers at 69, 77 and 85 dBA. The noise was separated from the voice signal using a high-resolution channel estimation technique. Both patients and controls increased voice sound pressure level (SPL), fundamental frequency (F0), subglottal pressure (Ps) and their subjective ratings of strain significantly as a main effect of the increased background noise. The patients used significantly higher Ps in all four conditions. Despite this they did not differ significantly from the controls in voice SPL, F0 or perceived strain. It was concluded that speaking in background noise is a risk factor for vocal loading. Vocal loading tests in clinical settings are important and further development of assessment methods is needed.

  • 82. Artman, H.
    et al.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hultén, M.
    Karlgren, K.
    Ramberg, R.
    The Interactionary as a didactic format in design education2015Ingår i: Proc. of KTH Scholarship of Teaching and Learning 2015, Stockholm, Sweden, 2015Konferensbidrag (Refereegranskat)
  • 83.
    Artman, Henrik
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Hulten, Magnus
    Linköpings universitet.
    Designed by Engineers: An analysis of interactionaries with engineering students2015Ingår i: Designs for Learning, ISSN 1654-7608, Vol. 7, nr 2, s. 28-56, artikel-id 10.2478/dfl-2014-0062Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The aim of this study is to describe and analyze learning taking place in a collaborative design exercise involving engineering students. The students perform a time-constrained, open-ended, complex interaction design task, an “interactionary”. A multimodal learning perspective is used. We have performed detailed analyses of video recordings of the engineering students, including classifying aspects of interaction. Our results show that the engineering students carry out and articulate their design work using a technology-centred approach and focus more on the function of their designs than on aspects of interaction. The engineering students mainly make use of ephemeral communication strategies (gestures and speech) rather than sketching in physical materials. We conclude that the interactionary may be an educational format that can help engineering students learn the messiness of design work. We further identify several constraints to the engineering students’ design learning and propose useful interventions that a teacher could make during an interactionary. We especially emphasize interventions that help engineering students retain aspects of human-centered design throughout the design process. This study partially replicates a previous study which involved interaction design students.

  • 84.
    Artman, Henrik
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hultén, Magnus
    Linköpings universitet.
    Design Learning Opportunities in Engineering Education: A case study of students solving an interaction–design task2014Ingår i: Proc. 4th International Designs for Learning Conference, 2014Konferensbidrag (Refereegranskat)
    Abstract [en]

    How do engineering students embrace interaction design? We presented two groups of chemical engineering students with an interaction design brief with the task of producing a concept prototype of an interactive artefact. Through interaction analysis of video material we analyse how the students gesture and use concepts adhering to interaction. The students frequently use gestures to enhance idea-generation. Sketches are used sparsely and other design materials were almost not used at all.

  • 85.
    Askenfelt, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Between the frog and the tip: Bowing gestures and bow-string interaction in violin playing (invited)2008Ingår i: Program abstracts for Acoustics‘08 Paris, Acoustical Society of America (ASA), 2008, s. 3656-Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    The motion of the bow gives a natural visualization of a string performance. Watching the player's bowing may augment the communicative power of the music, but all relevant bow control parameters are not easy to capture by a spectator. The string player controls volume of sound and tone quality continuously by coordination of three basic bowing parameters (bow velocity, bow‐bridge distance, and bow force), which set the main conditions for the bow‐string interaction. At a more detailed level of description, the tilting of the bow, which among other things controls the effective width of the bow hair, enters into the model. On a longer time scale, pre‐planned coordination schemes ('bowing gestures'), including the basic bowing parameters and the angles between the path of the bow and the strings, builds the performance. Systems for recording bowing parameters will be reviewed and results from old and current studies on bowing gestures presented. The player's choice and coordination of bowing parameters are constrained both in attacks and 'steady‐state' according to bow‐string interaction models. Recent verifications of these control spaces will be examined. Strategies for starting notes and examples of how players do in practice will be presented and compared with listeners' preferences.

  • 86.
    Askenfelt, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Double Bass2010Ingår i: The Science of String Instruments / [ed] Rossing, T., Springer-Verlag New York, 2010, s. 259-277Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    The study of the acoustics of bowed instruments has for several reasons focused on the violin. A substantial amount of knowledge has been accumulated over the last century (see Hutchins 1975, 1976; Hutchins and Benade 1997). The violin is discussed in Chap. 13, while the cello is discussed in Chap. 14. The bow is discussed in Chap. 16.

  • 87.
    Askenfelt, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Guettler, K.
    Stage floor vibrations and bass sound in concert halls2013Ingår i: Proceedings of Meetings on Acoustics: Volume 19, Issue 1, June 2013, Acoustical Society of America (ASA), 2013, s. 015028-Konferensbidrag (Refereegranskat)
    Abstract [en]

    The double bass and cello sections in the orchestra transmit vibrations to the stage floor through the end pins. Whether or not these vibrations may contribute to the perceived sound in the hall has been investigated since the 1930s. In this study the conditions for an efficient transfer of instrument vibrations to the floor, as well as the radiation from the floor to the audience area, are investigated. The study includes measurements of the impedance matching between bass and stage floor, the vibration velocity transfer to the floor via the endpin, and radiation from point-driven bending waves in the stage floor well below the coincidence frequency. The impedance conditions and radiation properties of the stage floors of five concert halls were investigated. In the two most promising halls, full-scale experiments were run with an artificially excited double bass supported via the end pin on the stage floor and on a concrete support below, respectively. The contribution from the stage floor radiation to the sound level in the audience area was 5 dB or more between 30 and 60 Hz. This range covers the fundamental frequencies over one octave starting from the lowest note (B0) of a five-string bass.

  • 88.
    Askenfelt, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Hansen, Kjetil Falkenberg
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Granqvist, Svante
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Hellmer, Kahl
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Orlarey, Y.
    Fober, D.
    Perifanos, K.
    Tambouratzis, G.
    Makropoulo, E.
    Chryssafidou, E.
    Arnaikos, L.
    Rattasepp, K.
    Dima, G.
    VEMUS, Virtual European Music School or A young person's interactive guide to making music2008Ingår i: Proceedings of the 28th ISME World Conference, 2008, s. 218-Konferensbidrag (Refereegranskat)
  • 89.
    Avramova, Vanya
    et al.
    KTH.
    Yang, Fangkai
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Li, Chengjie
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Peters, Christopher
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    A virtual poster presenter using mixed reality2017Ingår i: 17th International Conference on Intelligent Virtual Agents, IVA 2017, Springer, 2017, Vol. 10498, s. 25-28Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this demo, we will showcase a platform we are currently developing for experimenting with situated interaction using mixed reality. The user will wear a Microsoft HoloLens and be able to interact with a virtual character presenting a poster. We argue that a poster presentation scenario is a good test bed for studying phenomena such as multi-party interaction, speaker role, engagement and disengagement, information delivery, and user attention monitoring.

  • 90. Baptista La, Filipa Martins
    et al.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Pregnancy and the Singing Voice: Reports From a Case Study2012Ingår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 26, nr 4, s. 431-439Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Objectives. Significant changes in body tissues occur during pregnancy; however, literature concerning the effects of pregnancy on the voice is sparse, especially concerning the professional classically trained voice. Hypotheses. Hormonal variations and associated bodily changes during pregnancy affect phonatory conditions, such as vocal fold motility and glottal adduction. Design. Longitudinal case study with a semiprofessional classically trained singer. Methods. Audio, electrolaryngograph, oral pressure, and air flow signals were recorded once a week during the last 12 weeks of pregnancy, 48 hours after birth and during the following consecutive 11 weeks. Vocal tasks included diminuendo sequences of the syllable /pae/sung at various pitches, and performing a Lied. Phonation threshold pressures (PTPs) and collision threshold pressures (CTPs), normalized amplitude quotient (NAQ), alpha ratio, and the dominance of the voice source fundamental were determined. Concentrations of sex female steroid hormones were measured on three occasions. A listening test of timbral brightness and vocal fatigue was carried out. Results. Results demonstrated significantly elevated concentrations of estrogen and progesterone during pregnancy, which were considerably reduced after birth. During pregnancy, CTPs and PTPs were high; and NAQ, alpha ratio, and dominance of the voice source fundamental suggested elevated glottal adduction. In addition, a perceptible decrease of vocal brightness was noted. Conclusions. The elevated CTPs and PTPs during pregnancy suggest reduced vocal fold motility and increased glottal adduction. These changes are compatible with expected effects of elevated concentrations of estrogen and progesterone on tissue viscosity and water retention.

  • 91. Batliner, A.
    et al.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    D’Arcy, S.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Giuliani, D.
    Gerosa, M.
    Hacker, C.
    Russell, M.
    Steidl, S.
    Wong, M.
    The PF STAR Children’s Speech Corpus2005Ingår i: 9th European Conference on Speech Communication and Technology, 2005, s. 3761-3764Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes the corpus of recordings of children's speech which was collected as part of the EU FP5 PF_STAR project. The corpus contains more than 60 hours of speech, including read and imitated native-language speech in British English, German and Swedish, read and imitated non-native-language English speech from German, Italian and Swedish children, and native-language spontaneous and emotional speech in English and German.

  • 92.
    Bell, Linda
    et al.
    TeliaSonera R and D, Sweden.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Children’s convergence in referring expressions to graphical objects in a speech-enabled computer game2007Ingår i: 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, 2007, s. 2788-2791Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes an empirical study of children's spontaneous interactions with an animated character in a speech-enabled computer game. More specifically, it deals with convergence of referring expressions. 49 children were invited to play the game, which was initiated by a collaborative "put-that-there" task. In order to solve this task, the children had to refer to both physical objects and icons in a 3D environment. For physical objects, which were mostly referred to using straight-forward noun phrases, lexical convergence took place in 90% of all cases. In the case of the icons, the children were more innovative and spontaneously referred to them in many different ways. Even after being prompted by the system, lexical convergence took place for only 50% of the icons. In the cases where convergence did take place, the effect of the system's prompts were quite local, and the children quickly resorted to their original way of referring when naming new icons in later tasks.

  • 93. Bellec, G.
    et al.
    Elowsson, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Wolff, D.
    Weyde, T.
    A social network integrated game experiment to relate tapping to speed perception and explore rhythm reproduction2013Ingår i: Proceedings of the Sound and Music Computing Conference 2013, 2013, s. 19-26Konferensbidrag (Refereegranskat)
    Abstract [en]

    During recent years, games with a purpose (GWAPs) have become increasingly popular for studying human behaviour [1–4]. However, no standardised method for web-based game experiments has been proposed so far. We present here our approach comprising an extended version of the CaSimIR social game framework [5] for data collection, mini-games for tempo and rhythm tapping, and an initial analysis of the data collected so far. The game presented here is part of the Spot The Odd Song Out game, which is freely available for use on Facebook and on the Web 1 .We present the GWAP method in some detail and a preliminary analysis of data collected. We relate the tapping data to perceptual ratings obtained in previous work. The results suggest that the tapped tempo data collected in a GWAP can be used to predict perceived speed. I toned down the above statement as I understand from the results section that our data are not as good as When averagingthe rhythmic performances of a group of 10 players in the second experiment, the tapping frequency shows a pattern that corresponds to the time signature of the music played. Our experience shows that more effort in design and during runtime is required than in a traditional experiment. Our experiment is still running and available on line.

  • 94. Bertenstam, J
    et al.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K.O.E
    Granström, B
    Gustafson, J
    Hunnicutt, S
    Högberg, J
    Lindell, R
    Neovius, L
    Nord, L
    De Serpa-Leitao, A
    Ström, N
    THE WAXHOLM APPLICATION DATABASE1995Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes an application database collected in Wizard-of-Oz experiments in a spoken dialogue system, WAXHOLM. The system provides information on boat traffic in the Stockholm archipelago. The database consists of utterance-length speech files, their corressonding transcriptions, and log files of the dialogue sessions. In addition to the spontaneous dialogue speech, the material also comprise recordings of phonetically balanced reference sentences uttered by all 66 subjects. In the paper the recording procedure is described as well as some characteristics of the speech data and the dialogue.

  • 95.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Perception of Gaze Direction in 2D and 3D Facial Projections2010Ingår i: The ACM / SSPNET 2nd International Symposium on Facial Analysis and Animation, New York, USA: ACM Press, 2010, s. 24-24Konferensbidrag (Refereegranskat)
    Abstract [en]

    In human-human communication, eye gaze is a fundamental cue in e.g. turn-taking and interaction control [Kendon 1967]. Accurate control of gaze direction is therefore crucial in many applications of animated avatars striving to simulate human interactional behaviors. One inherent complication when conveying gaze direction through a 2D display, however, is what has been referred to as the Mona Lisa effect; if the avatar is gazing towards the camera, the eyes seem to "follow" the beholder whatever vantage point he or she may assume [Boyarskaya and Hecht 2010]. This becomes especially problematic in applications where multiple persons are interacting with the avatar, and the system needs to use gaze to address a specific person. Introducing 3D structure in the facial display, e.g. projecting the avatar face on a face mask, makes the percept of the avatar's gazechange with the viewing angle, as is indeed the case with real faces. To this end, [Delaunay et al. 2010] evaluated two back-projected displays - a spherical "dome" and a face shaped mask. However, there may be many factors influencing gaze directionpercieved from a 3D facial display, so an accurate calibration procedure for gaze directionis called for.

  • 96.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Alexanderson, Simon
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Kinetic Data for Large-Scale Analysis and Modeling of Face-to-Face Conversation2011Ingår i: Proceedings of International Conference on Audio-Visual Speech Processing 2011 / [ed] Salvi, G.; Beskow, J.; Engwall, O.; Al Moubayed, S., Stockholm: KTH Royal Institute of Technology, 2011, s. 103-106Konferensbidrag (Refereegranskat)
    Abstract [en]

    Spoken face to face interaction is a rich and complex form of communication that includes a wide array of phenomena thatare not fully explored or understood. While there has been extensive studies on many aspects in face-to-face interaction, these are traditionally of a qualitative nature, relying on hand annotated corpora, typically rather limited in extent, which is a natural consequence of the labour intensive task of multimodal data annotation. In this paper we present a corpus of 60 hours of unrestricted Swedish face-to-face conversations recorded with audio, video and optical motion capture, and we describe a new project setting out to exploit primarily the kinetic data in this corpus in order to gain quantitative knowledge on humanface-to-face interaction.

  • 97.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Alexanderson, Simon
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Stefanov, Kalin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Claesson, Britt
    Derbring, Sandra
    Fredriksson, Morgan
    The Tivoli System - A Sign-driven Game for Children with Communicative Disorders2013Konferensbidrag (Refereegranskat)
  • 98.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Alexanderson, Simon
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Stefanov, Kalin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Claesson, Britt
    Derbring, Sandra
    Fredriksson, Morgan
    Starck, J.
    Axelsson, E.
    Tivoli - Learning Signs Through Games and Interaction for Children with Communicative Disorders2014Konferensbidrag (Refereegranskat)
  • 99.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Bruce, Gösta
    Lund universitet.
    Enflo, Laura
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Schötz, Susanne
    Lund universitet.
    Human Recognition of Swedish Dialects2008Ingår i: Proceedings of Fonetik 2008: The XXIst Swedish Phonetics Conference / [ed] Anders Eriksson, Jonas Lindh, Göteborg: Göteborgs universitet , 2008, s. 61-64Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Our recent work within the research projectSIMULEKT (Simulating Intonational Varieties of Swedish) involves a pilot perceptiontest, used for detecting tendencies in humanclustering of Swedish dialects. 30 Swedishlisteners were asked to identify the geographical origin of 72 Swedish native speakers by clicking on a map of Sweden. Resultsindicate for example that listeners from thesouth of Sweden are generally better at recognizing some major Swedish dialects thanlisteners from the central part of Sweden.

  • 100.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Bruce, Gösta
    Lunds universitet.
    Enflo, Laura
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Schötz, Susanne
    Lunds universitet.
    Recognizing and Modelling Regional Varieties of Swedish2008Ingår i: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, 2008, s. 512-515Konferensbidrag (Refereegranskat)
    Abstract [en]

    Our recent work within the research project SIMULEKT (Simulating Intonational Varieties of Swedish) includes two approaches. The first involves a pilot perception test, used for detecting tendencies in human clustering of Swedish dialects. 30 Swedish listeners were asked to identify the geographical origin of Swedish native speakers by clicking on a map of Sweden. Results indicate for example that listeners from the south of Sweden are better at recognizing some major Swedish dialects than listeners from the central part of Sweden, which includes the capital area. The second approach concerns a method for modelling intonation using the newly developed SWING (Swedish INtonation Generator) tool, where annotated speech samples are resynthesized with rule based intonation and audiovisually analysed with regards to the major intonational varieties of Swedish. We consider both approaches important in our aim to test and further develop the Swedish prosody model.

1234567 51 - 100 av 1064
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf