Ändra sökning
Avgränsa sökresultatet
2345678 201 - 250 av 692
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 201.
    Elenius, Daniel
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Dynamic vocal tract length normalization in speech recognition2010Ingår i: Proceedings from Fonetik 2010: Working Papers 54, Centre for Languages and Literature, Lund University, Sweden, 2010, s. 29-34Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    A novel method to account for dynamic speaker characteristic properties in aspeech recognition system is presented. The estimated trajectory of a property canbe constrained to be constant or to have a limited rate-of-change within a phone ora sub-phone state. The constraints are implemented by extending each state in thetrained Hidden Markov Model by a number of property-value-specific sub-statestransformed from the original model. The connections in the transition matrix ofthe extended model define possible slopes of the trajectory. Constraints on itsdynamic range during an utterance are implemented by decomposing the trajectoryinto a static and a dynamic component. Results are presented on vocal tract lengthnormalization in connected-digit recognition of children's speech using modelstrained on male adult speech. The word error rate was reduced compared with theconventional utterance-specific warping factor by 10% relative.

  • 202.
    Elenius, Daniel
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    On Extending VTLN to Phoneme-specific Warping in Automatic Speech Recognition2009Ingår i: Proceedings of Fonetik 2009, 2009Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Phoneme- and formant-specific warping hasbeen shown to decrease formant and cepstralmismatch. These findings have not yet been fullyimplemented in speech recognition. This paperdiscusses a few reasons how this can be. A smallexperimental study is also included where phoneme-independent warping is extended towardsphoneme-specific warping. The results of thisinvestigation did not show a significant decreasein error rate during recognition. This isalso in line with earlier experiments of methodsdiscussed in the paper.

  • 203.
    Elenius, Daniel
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Units for Dynamic Vocal Tract Length NormalizationManuskript (preprint) (Övrigt vetenskapligt)
    Abstract [en]

    A novel method to account for dynamic speaker characteristic properties in a speech recognition system is presented. The estimated trajectory of a property can be constrained to be constant or to have a limited rate-of-change within a phone or a sub-phone state, or be allowed to change between individual speech frames. The constraints are implemented by extending each state in the HMM by a number of property-specific sub-states transformed from the original model. The connections in the transition matrix of the extended model define possible slopes of the trajectory. Constraints on its dynamic range during an utterance are implemented by decomposing the trajectory into a static and a dynamic component. Results are presented on vocal tract length normalization in connected-digit recognition of children's speech using models trained on male adult speech. The word error rate was reduced compared with the conventional utterance-specific warping factor by 10% relative.

  • 204.
    Elenius, Daniel
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Vocal tract length compensation in the signal and model domains in child speech recognition2007Ingår i: Proceedings of Fonetik: TMH-QPSR, 2007, s. 41-44Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In a newly started project, KOBRA, we study methods to reduce the required amount of training data for speech recognition by combining the conventional data-driven training approach with available partial knowledge on speech production, implemented as transformation functions in the acoustic, articulatory and speaker characteristic domains. Initially, we investigate one well-known dependence, the inverse proportional relation between vocal tract length and formant frequencies. In this report, we have replaced the conventional technique of frequency warping the unknown input utterance (VTLN) by transforming the training data instead. This enables phoneme-dependent warping to be performed. In another experiment, we expanded the available training data by duplicating each training utterance into a number of differently warped instances. Training on this expanded corpus results in models, each one representing the whole range of vocal tract length variation. This technique allows every frame of the utterance to be warped differently. The computational load is reduced by an order of magnitude compared to conventional VTLN without notice- able decrease in performance on the task of recognising children’s speech using models trained on adult speech.

  • 205.
    Elenius, Kjell
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Forsbom, Eva
    Uppsala University.
    Megyesi, Beata
    Uppsala University.
    Language Resources and Tools for Swedish: A Survey2008Ingår i: Proc of LREC 2008, Marrakech, Marocko, 2008Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Language resources and tools to create and process these resources are necessary components in human language technology and natural language applications. In this paper, we describe a survey of existing language resources for Swedish, and the need for Swedish language resources to be used in research and real-world applications in language technology as well as in linguistic research. The survey is based on a questionnaire sent to industry and academia, institutions and organizations, and to experts involved in the development of Swedish language resources in Sweden, the Nordic countries and world-wide.

  • 206.
    Elowsson, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Modelling Perception of Speed in Music Audio2013Ingår i: Proceedings of the Sound and Music Computing Conference 2013, 2013, s. 735-741Konferensbidrag (Refereegranskat)
    Abstract [en]

    One of the major parameters in music is the overall speed of a musical performance. Speed is often associated with tempo, but other factors such as note density (onsets per second) seem to be important as well. In this study, a computational model of speed in music audio has been developed using a custom set of rhythmic features. The original audio is first separated into a harmonic part and a percussive part and onsets are extracted separately from the different layers. The characteristics of each onset are determined based on frequency content as well as perceptual salience using a clustering approach. Using these separated onsets a set of eight features including a tempo estimation are defined which are specifically designed for modelling perceived speed. In a previous study 20 listeners rated the speed of 100 ringtones consisting mainly of popular songs, which had been converted from MIDI to audio. The ratings were used in linear regression and PLS regression in order to evaluate the validity of the model as well as to find appropriate features. The computed audio features were able to explain about 90 % of the variability in listener ratings.

  • 207.
    Enflo, Laura
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik. Linköping Univ, Sweden.
    Herbst, C.T.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    McAllister, A.
    Comparing Vocal Fold Contact Criteria Derived From Audio and Electroglottographic Signals2016Ingår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 30, nr 4, s. 381-388Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Objectives. Collision threshold pressure (CTP), that is, the lowest subglottal pressure facilitating vocal fold contact during phonation, is likely to reflect relevant vocal fold properties. The amplitude of an electroglottographic (EGG) signal or the amplitude of its first derivative (dEGG) has been used as criterion of such contact. Manual measurement of CTP is time consuming, making the development of a simpler, alternative method desirable. Method. In this investigation, we compare CTP values measured manually to values automatically derived from dEGG and to values derived from a set of alternative parameters, some obtained from audio and some from EGG signals. One of the parameters was the novel EGG wavegram, which visualizes sequences of EGG or dEGG cycles, normalized with respect to period and amplitude. Raters with and without previous acquaintance with EGG analysis marked the disappearance of vocal fold contact in dEGG and in wavegram displays of /pa:/-=sequences produced with continuously decreasing vocal loudness by seven singer subjects. Results. Vocal fold contact was mostly identified accurately in displays of both dEGG amplitude and wavegram. Automatically derived CTP values showed high correlation with those measured manually and with those derived from the ratings of the visual displays. Seven other parameters were tested as criteria of such contact. Mainly, because of noise in the EGG signal, most of them yielded CTP values differing considerably from those derived from the manual and the automatic methods, although the EGG spectrum slope showed a high correlation. Conclusion. The possibility of measuring CTP automatically seems promising for future investigations.

  • 208.
    Enflo, Laura
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Threshold Pressure For Vocal Fold Collision2007Ingår i: Proceedings of Pan European Voice Conference 7 (PEVOC 7), Groningen, The Netherlands, 2007, s. 69-Konferensbidrag (Refereegranskat)
  • 209.
    Enflo, Laura
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Pabst, Friedemann
    Hospital Dresden Friedrichstadt, Dresden, Germany.
    Effects of vocal loading on the phonation and collision threshold pressures2009Ingår i: Proceedings of Fonetik 2009: The XXIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 24-27Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Phonation threshold pressures (PTP) have been commonly used for obtaining a quantita-tive measure of vocal fold motility. However, as these measures are quite low, it is typically dif-ficult to obtain reliable data. As the amplitude of an electroglottograph (EGG) signal de-creases substantially at the loss of vocal fold contact, it is mostly easy to determine the colli-sion threshold pressure (CTP) from an EGG signal. In an earlier investigation (Enflo & Sundberg, forthcoming) we measured CTP and compared it with PTP in singer subjects. Re-sults showed that in these subjects CTP was on average about 4 cm H2O higher than PTP. The PTP has been found to increase during vocal fatigue. In the present study we compare PTP and CTP before and after vocal loading in singer and non-singer voices, applying a load-ing procedure previously used by co-author FP. Seven subjects repeated the vowel se-quence /a,e,i,o,u/ at an SPL of at least 80 dB @ 0.3 m for 20 min. Before and after the loading the subjects’ voices were recorded while they produced a diminuendo repeating the syllable /pa/. Oral pressure during the /p/ occlusion was used as a measure of subglottal pressure. Both CTP and PTP increased significantly after the vocal loading.

  • 210.
    Enflo, Laura
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Romedahl, Camilla
    McAllister, Anita
    Effects on Vocal Fold Collision and Phonation Threshold Pressure of Resonance Tube Phonation With Tube End in Water2013Ingår i: Journal of Speech, Language and Hearing Research, ISSN 1092-4388, E-ISSN 1558-9102, Vol. 56, nr 5, s. 1530-1538Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Purpose: Resonance tube phonation in water (RTPW) or in air is a voice therapy method successfully used for treatment of several voice pathologies. Its effect on the voice has not been thoroughly studied. This investigation analyzes the effects of RTPW on collision and phonation threshold pressures (CTP and PTP), the lowest subglottal pressure needed for vocal fold collision and phonation, respectively. Method: Twelve mezzo-sopranos phonated into a glass tube, the end of which was placed under the water surface in a jar. Subglottal pressure, electroglottography, and audio signals were recorded before and after exercise. Also, the perceptual effects were assessed in a listening test with an expert panel, who also rated the subjects' singing experience. Results: Resonance tube phonation significantly increased CTP and also tended to improve perceived voice quality. The latter effect was mostly greater in singers who did not practice singing daily. In addition, a more pronounced perceptual effect was found in singers rated as being less experienced. Conclusion: Resonance tube phonation significantly raised CTP and tended to improve perceptual ratings of voice quality. The effect on PTP did not reach significance.

  • 211.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Analysis of and feedback on phonetic features in pronunciation training with a virtual teacher2012Ingår i: Computer Assisted Language Learning, ISSN 0958-8221, E-ISSN 1744-3210, Vol. 25, nr 1, s. 37-64Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Pronunciation errors may be caused by several different deviations from the target, such as voicing, intonation, insertions or deletions of segments, or that the articulators are placed incorrectly. Computer-animated pronunciation teachers could potentially provide important assistance on correcting all these types of deviations, but they have an additional benefit for articulatory errors. By making parts of the face transparent, they can show the correct position and shape of the tongue and provide audiovisual feedback on how to change erroneous articulations. Such a scenario however requires firstly that the learner's current articulation can be estimated with precision and secondly that the learner is able to imitate the articulatory changes suggested in the audiovisual feedback. This article discusses both these aspects, with one experiment on estimating the important articulatory features from a speaker through acoustic-to-articulatory inversion and one user test with a virtual pronunciation teacher, in which the articulatory changes made by seven learners who receive audiovisual feedback are monitored using ultrasound imaging.

  • 212.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Articulatory synthesis using corpus-based estimation of line spectrum pairs2005Ingår i: 9th European Conference on Speech Communication and Technology, 2005, s. 1909-1912Konferensbidrag (Refereegranskat)
    Abstract [en]

    An attempt to define a new articulatory synthesis method, in which the speech signal is generated through a statistical estimation of its relation with articulatory parameters, is presented. A corpus containing acoustic material and simultaneous recordings of the tongue and facial movements was used to train and test the articulatory synthesis of VCV words and short sentences. Tongue and facial motion data, captured with electromagnetic articulography and three-dimensional optical motion tracking, respectively, define articulatory parameters of a talking head. These articulatory parameters are then used as estimators of the speech signal, represented by line spectrum pairs. The statistical link between the articulatory parameters and the speech signal was established using either linear estimation or artificial neural networks. The results show that the linear estimation was only enough to synthesize identifiable vowels, but not consonants, whereas the neural networks gave a perceptually better synthesis.

  • 213.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Assessing MRI measurements: Effects of sustenation, gravitation and coarticulation2006Ingår i: Speech production: Models, Phonetic Processes and Techniques / [ed] Harrington, J.; Tabain, M., New York: Psychology Press , 2006, s. 301-314Kapitel i bok, del av antologi (Refereegranskat)
  • 214.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Augmented Reality Talking Heads as a Support for Speech Perception and Production2011Ingår i: Augmented Reality: Some Emerging Application Areas / [ed] Nee, Andrew Yeh Ching, IN-TECH, 2011, s. 89-114Kapitel i bok, del av antologi (Refereegranskat)
  • 215.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Bättre tala än texta - talteknologi nu och i framtiden2008Ingår i: Tekniken bakom språket / [ed] Domeij, Rickard, Stockholm: Norstedts Akademiska Förlag , 2008, s. 98-118Kapitel i bok, del av antologi (Övrig (populärvetenskap, debatt, mm))
  • 216.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Datoranimerade talande ansikten2012Ingår i: Människans ansikten: Emotion, interaktion och konst / [ed] Adelswärd, V.; Forstorp, P-A., Stockholm: Carlssons Bokförlag , 2012Kapitel i bok, del av antologi (Övrigt vetenskapligt)
  • 217.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Evaluation of speech inversion using an articulatory classifier2006Ingår i: In Proceedings of the Seventh International Seminar on Speech Production / [ed] Yehia, H.; Demolin, D.; Laboissière, R., 2006, s. 469-476Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents an evaluation method for statistically basedspeech inversion, in which the estimated vocal tract shapes are classified intophoneme categories based on the articulatory correspondence with prototypevocal tract shapes. The prototypes are created using the original articulatorydata and the classifier hence permits to interpret the results of the inversion interms of, e.g., confusions between different articulations and the success in estimatingdifferent places of articulation. The articulatory classifier was used toevaluate acoustic and audiovisual speech inversion of VCV words and Swedishsentences performed with a linear estimation and an artificial neural network.

  • 218.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Feedback strategies of human and virtual tutors in pronunciation training2006Ingår i: TMH-QPSR, ISSN 1104-5787, Vol. 48, nr 1, s. 011-034Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    This paper presents a survey of language teachers’ and their students’ attitudes and practice concerning the use of corrective feedback in pronunciation training. Theaim of the study is to identify feedback strategies that can be used successfully ina computer assisted pronunciation training system with a virtual tutor giving articulatoryinstructions and feedback. The study was carried out using focus groupmeetings, individual semi-structured interviews and classroom observations. Implicationsfor computer assisted pronunciation training are presented and some havebeen tested with 37 users in a short practice session with a virtual teacher

  • 219.
    Engwall, Olov
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    From real-time MRI to 3D tongue movements2004Ingår i: INTERSPEECH 2004: ICSLP 8th International Conference on Spoken Language Processing / [ed] Kim, S. H.; Young, D. H., 2004, s. 1109-1112Konferensbidrag (Refereegranskat)
    Abstract [en]

    Real-time Magnetic Resonance Imaging (MRI) at 9 images/s of the midsagittal plane is used as input to a threedimensionaltongue model, previously generated based onsustained articulations imaged with static MRI. The aimis two-fold, firstly to use articulatory inversion to extrapolatethe midsagittal tongue movements to three-dimensionalmovements, secondly to determine the accuracy of thetongue model in replicating the real-time midsagittal tongueshapes. The evaluation of the inversion shows that the realtimemidsagittal contour is reproduced with acceptable accuracy.This means that the 3D model can be used to representreal-time articulations, eventhough the artificially sustainedarticulations on which it was based were hyperarticulated andhad a backward displacement of the tongue.

  • 220.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Introducing visual cues in acoustic-to-articulatory inversion2005Ingår i: Interspeech 2005: 9th European Conference on Speech Communication and Technology, 2005, s. 3205-3208Konferensbidrag (Refereegranskat)
    Abstract [en]

    The contribution of facial measures in a statistical acoustic-to- articulatory inversion has been investigated. The tongue contour was estimated using a linear estimation from either acoustics or acoustics and facial measures. Measures of the lateral movement of lip corners and the vertical movement of the upper and lower lip and the jaw gave a substantial improvement over the audio-only case. It was further found that adding the corresponding articulatory measures that could be extracted from a profile view of the face; i.e. the protrusion of the lips, lip corners and the jaw, did not give any additional improvement of the inversion result. The present study hence suggests that audiovisual-to-articulatory inversion can as well be performed using front view monovision of the face, rather than stereovision of both the front and profile view.

  • 221.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Is there a McGurk effect for tongue reading?2010Ingår i: Proceedings of AVSP: International Conferenceon Audio-Visual Speech Processing, 2010Konferensbidrag (Refereegranskat)
    Abstract [en]

    Previous studies on tongue reading, i.e., speech perception ofdegraded audio supported by animations of tongue movementshave indicated that the support is weak initially and that subjectsneed training to learn to interpret the movements. Thispaper investigates if the learning is of the animation templatesas such or if subjects learn to retrieve articulatory knowledgethat they already have. Matching and conflicting animationsof tongue movements were presented randomly together withthe auditory speech signal at three different levels of noise in aconsonant identification test. The average recognition rate overthe three noise levels was significantly higher for the matchedaudiovisual condition than for the conflicting and the auditoryonly. Audiovisual integration effects were also found for conflictingstimuli. However, the visual modality is given much lessweight in the perception than for a normal face view, and intersubjectdifferences in the use of visual information are large.

  • 222.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Pronunciation analysis by acoustic-to-articulatory feature inversion2012Ingår i: Proceedings of the International Symposium on Automatic detection of Errors in Pronunciation Training / [ed] Engwall, O., Stockholm, 2012, s. 79-84Konferensbidrag (Refereegranskat)
    Abstract [en]

    Second  language  learners  may  require  assistancecorrecting  their  articulation  of  unfamiliar  phonemes  in  orderto  reach  the  target  pronunciation.  If,  e.g.,  a  talking  head  isto  provide  the  learner  with  feedback  on  how  to  change  thearticulation,  a  required  first  step  is  to  be  able  to  analyze  thelearner’s  articulation.  This  paper  describes  how  a  specializedrestricted  acoustic-to-articulatory  inversion  procedure  may  beused  for  this  analysis.  The  inversion  is  trained  on  simultane-ously  recorded  acoustic-articulatory  data  of  one  native  speakerof  Swedish,  and  four  different  experiments  investigate  how  itperforms  for  the  original  speaker,  using  acoustic  input;  for  theoriginal speaker, using acoustic input and visual information; forfour other speakers; and for correct and mispronounced phonesuttered by two non-native speakers

  • 223.
    Engwall, Olov
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Speaker adaptation of a three-dimensional tongue model2004Ingår i: INTERSPEECH 2004: ICSLP 8th International Conference on Spoken Language Processing / [ed] Kim, S. H.; Young, D. H., 2004, s. 465-468Konferensbidrag (Refereegranskat)
    Abstract [en]

    Magnetic Resonance Images of nine subjects have been collected to determine scaling factors that can adapt a 3D tongue model to new subjects. The aim is to define few and simple measures that will allow for an automatic, but accurate, scaling of the model. The scaling should be automatic in order to be useful in an application for articulation training, in which the model must replicate the user's articulators without involving the user in a complicated speaker adaptation. It should further be accurate enough to allow for correct acoustic-to-articulatory inversion. The evaluation shows that the defined scaling technique is able to estimate a tongue shape that was not included in the training with an accuracy of 1.5 mm in the midsagittal plane and 1.7 mm for the whole 3D tongue, based on four articulatory measures.

  • 224.
    Engwall, Olov
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Delvaux, V.
    Metens, T.
    Interspeaker Variation in the Articulation of French Nasal Vowels2006Ingår i: In Proceedings of the Seventh International Seminar on Speech Production, 2006, s. 3-10Konferensbidrag (Refereegranskat)
  • 225.
    Engwall, Olov
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Can you tell if tongue movements are real or synthetic?2009Ingår i: Proceedings of AVSP, 2009Konferensbidrag (Refereegranskat)
    Abstract [en]

    We have investigated if subjects are aware of what natural tongue movements look like, by showing them animations based on either measurements or rule-based synthesis. The issue is of interest since a previous audiovisual speech perception study recently showed that the word recognition rate in sentences with degraded audio was significantly better with real tongue movements than with synthesized. The subjects in the current study could as a group not tell which movements were real, with a classification score at chance level. About half of the subjects were significantly better at discriminating between the two types of animations, but their classification score was as often well below chance as above. The correlation between classification score and word recognition rate for subjects who also participated in the perception study was very weak, suggesting that the higher recognition score for real tongue movements may be due to subconscious, rather than conscious, processes. This finding could potentially be interpreted as an indication that audiovisual speech perception is based onarticulatory gestures.

  • 226.
    Engwall, Olov
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Real vs. rule-generated tongue movements as an audio-visual speech perception support2009Ingår i: Proceedings of Fonetik 2009 / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 30-35Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We have conducted two studies in which animations created from real tongue movements and rule-based synthesis are compared. We first studied if the two types of animations were different in terms of how much support they give in a perception task. Subjects achieved a significantly higher word recognition rate insentences when animations were shown compared to the audio only condition, and a significantly higher score with real movements than with synthesized. We then performed a classification test, in which subjects should indicate if the animations were created from measurements or from rules. The results show that the subjects as a group are unable to tell if the tongue movements are real or not. The stronger support from real movements hence appears to be due to subconscious factors.

  • 227.
    Engwall, Olov
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Design strategies for a virtual language tutor2004Ingår i: INTERSPEECH 2004, ICSLP, 8th International Conference on Spoken Language Processing, Jeju Island, Korea, October 4-8, 2004 / [ed] Kim, S. H.; Young, D. H., Jeju Island, Korea, 2004, s. 1693-1696Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper we discuss work in progress on an interactive talking agent as a virtual language tutor in CALL applications. The ambition is to create a tutor that can be engaged in many aspects of language learning from detailed pronunciation to conversational training. Some of the crucial components of such a system is described. An initial implementation of a stress/quantity training scheme will be presented.

  • 228.
    Enlund, Nils
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och grafisk produktion, Media (stängd 20111231).
    Askenfelt, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Mediated masterclass teaching2007Ingår i: Proc. of Developing Innovative Video Resources for Students Everywhere, Lillehammer: Høgskolen i Lillehammer , 2007Konferensbidrag (Övrigt vetenskapligt)
  • 229.
    Ericsson, Chistina
    et al.
    Talboks- och Punktskriftsbiblioteket.
    Klein, Jesper
    Talboks- och Punktskriftsbiblioteket.
    Sjölander, Kåre
    Talboks- och Punktskriftsbiblioteket.
    Sönnebo, Lars
    Talboks- och Punktskriftsbiblioteket.
    Filibuster: a new Swedish text-to-speech system2007Ingår i: Proceedings of Fonetik 2007, Stockholm: KTH Royal Institute of Technology, 2007, Vol. 50, nr 1, s. 33-36Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    A Swedish text-to-speech system has been developed at the Swedish Library of Talking Books and Braille (TPB). The system, named Filibuster, is open and extensible and makes it possible to generate synthetic speech with a high degree of control. The Filibuster system is used in the production of talking books in TPBs service for print handicapped students at university level. Through the use of text-to-speech, students can receive their talking books much faster. Also, each book costs less to produce. The system was deployed in February 2007 and during this year the plan is to produce a total of 200 titles. The system has been designed specifically for creating talking book versions of university textbooks. It has a large lexicon, covering some 573,000 words and names. Filibuster includes a comprehensive text pre-processor to write out nonword entities, such as numbers, characters and expressions. So far, one male voice, Folke, has been created, but more are planned.

  • 230. Eriksson, Gunnar
    et al.
    Karlgren, Jussi
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Features for modelling characteristics of conversations: Notebook for PAN at CLEF 20122012Ingår i: CLEF 2012 Evaluation Labs and Workshop Online Working Notes, 2012Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this experiment, we find that features which model interaction andconversational behaviour contribute well to identifying sexual grooming behaviourin chat and forum text. Together with the obviously useful lexical features —which we find are more valuable if separated by who generates them — weachieve very successful results in identifying behavioural patterns which maycharacterise sexual grooming. We conjecture that the general framework can beused for other purposes than this specific case if the lexical features are exchangedfor other topical models, the conversational features characterise interaction andbehaviour rather than topical choice.

  • 231.
    Fabiani, Marco
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    A prototype system for rule-based expressive modifications of audio recordings2007Ingår i: Proc. of the Int. Symp. on Performance Science 2007, Porto, Portugal: AEC (European Conservatories Association) , 2007, s. 355-360Konferensbidrag (Refereegranskat)
    Abstract [en]

    A prototype system is described that aims to modify a musical recording in an expressive way using a set of performance rules controlling tempo, sound level and articulation. The audio signal is aligned with an enhanced score file containing performance rules information. A time-frequency transformation is applied, and the peaks in the spectrogram, representing the harmonics of each tone, are tracked and associated with the corresponding note in the score. New values for tempo, note lengths and sound levels are computed based on rules and user decisions. The spectrogram is modified by adding, subtracting and scaling spectral peaks to change the original tone’s length and sound level. For tempo variations, a time scale modification algorithm is integrated in the time domain re-synthesis process. The prototype is developed in Matlab. An intuitive GUI is provided that allows the user to choose parameters, listen and visualize the audio signals involved and perform the modifications. Experiments have been performed on monophonic and simple polyphonic recordings of classical music for piano and guitar.

  • 232.
    Fabiani, Marco
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Expressive modifications of musical audio recordings: preliminary results2007Ingår i: Proc. of the 2007 Int. Computer Music Conf. (ICMC07), Copenhagen, Denmark: The International Computer Music Association and Re:New , 2007, s. 21-24Konferensbidrag (Refereegranskat)
    Abstract [en]

    A system is described that aims to modify the performance of a musical recording (classical music) by changing the basic performance parameters tempo, sound level and tone duration. The input audio file is aligned with the corresponding score, which also contains extra information defining rule-based modifications of these parameters. The signal is decomposed using analysis-synthesis techniques to separate and modify each tone independently. The user can control the performance by changing the quantity of performance rules or by directly modifying the parameters values. A prototype Matlab implementation of the system performs expressive tempo and articulation modifications of monophonic and simple polyphonic audio recordings.

  • 233. Fano, E.
    et al.
    Karlgren, Jussi
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Teoretisk datalogi, TCS.
    Nivre, J.
    Uppsala University and Gavagai at CLEF Erisk: Comparing word embedding models2019Ingår i: CEUR Workshop Proceedings, CEUR-WS , 2019, Vol. 2380Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes an experiment to evaluate the performance of three different types of semantic vectors or word embeddings-random indexing, GloVe, and ELMo-and two different classification architectures-linear regression and multi-layer perceptrons-for the specific task of identifying authors with eating disorders from writings they publish on a discussion forum. The task requires the classifier to process texts written by the authors in the sequence they were published, and to identify authors likely to be at risk of suffering from eating disorders as early as possible. The data are part of the eRISK evaluation task of CLEF 2019 and evaluated according to the eRISK metrics. Contrary to our expectations, we did not observe a clear-cut advantage using the recently popular contextualized ELMo vectors over the commonly used and much more light-weight GloVe vectors, or the more handily learnable random indexing vectors.

  • 234.
    Fant, Gunnar
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Historical notes2005Ingår i: TMH-QPSR, ISSN 1104-5787, Vol. 47, nr 1, s. 009-019Artikel i tidskrift (Övrigt vetenskapligt)
  • 235.
    Fant, Gunnar
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Speech research in a historical perspective2004Ingår i: From sound to sense: 50+ years of discoveries in speech communication / [ed] Janet Slifka, Sharon Manuel, Melanie Matthies, Research Laboratory of Electronics , 2004Konferensbidrag (Refereegranskat)
  • 236.
    Fant, Gunnar
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Kruckenberg, Anita
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Analysis and synthesis of Swedish prosody with outlooks on production and perception2004Ingår i: Festschrift Wu Zongji. From traditional phonology to modern speech processing / [ed] Fant, G.; Fujisaki, H.; Chao, J.; Xu, Y., Beijing: Foreign Language Teaching and Research Press , 2004, s. 73-95Kapitel i bok, del av antologi (Refereegranskat)
  • 237.
    Fant, Gunnar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Kruckenberg, Anita
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Co-variation of acoustic parameters in prosody2007Ingår i: TMH-QPSR, ISSN 1104-5787, Vol. 50, nr 1, s. 1-4Artikel i tidskrift (Övrigt vetenskapligt)
  • 238.
    Fant, Gunnar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Kruckenberg, Anita
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Covariation of subglottal pressure, F0 and intensity2005Ingår i: 9th European Conference on Speech Communication and Technology, 2005, s. 1061-1064Konferensbidrag (Refereegranskat)
    Abstract [en]

    This is a report summarising results from studies of true subglottal pressure, supraglottal pressure and speech wave data. We have derived co-variation patterns, which allow a prediction of intensity from subglottal pressure and F0, and conversely a prediction of a subglottal pressure contour from F0 and intensity. Of special interest is the significance of a mid-point in a speakers' available F0 range at which the relations change. In the lower F0 range subglottal pressure and intensity rise with F0, whilst in the upper part they tend to saturate. In connected speech we find a build-up of subglottal pressure well in advance of a stressed word, and a decay of subglottal pressure in the final part of a phrase starting already in the falling branch of an F0 peak.

  • 239.
    Fant, Gunnar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Kruckenberg, Anita
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Individual and contextual variations of prosodic parameters2006Ingår i: TMH-QPSR, ISSN 1104-5787, Vol. 48, nr 1, s. 005-009Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    This is a summary of variabilities and co-variation of prosodic parameters found in our studies of text reading and in the development of text-to-speech synthesis. In addition to F0, duration and intensity, the survey includes aspects of voice production and perception. The role of sub-glottal pressure is discussed. Speech parameters have been correlated with our continuously graded prominence parameter RS. Individual variations in pausing and the realisation of prosodic boundaries have been studied.

  • 240.
    Fant, Gunnar
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Kruckenberg, Anita
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Intonation analysis and synthesis with reference to Swedish2004Ingår i: Proc of International Symposium on Tonal Aspects of Language, TAL 2004, Beijing, 2004, s. 57-60Konferensbidrag (Refereegranskat)
    Abstract [en]

    The present report reviews findings about FO patterns and in specific the realisation of the Swedish accent 1 and accent 2 tone patterns. We have developed a novel system for normalizing FO contours which allows the handling of male and female data in a common frame. It also facilitates the sorting out of individual patterns from a norm. For this purpose we have defined a semitone scale with a fixed reference. As in the Fujisaki model, we have employed a superposition scheme of adding local FO modulations to prosodic phrase contours, but with different shaping algorithms. The influence of the syntactic frame, and of word prominence and its relation to the single peak of accent 1 and the dual peak of accent 2 has been quantified. Some language universal traits, such as time constants and typical shapes of local FO patterns, are discussed. The perceptual smoothing of local FO contours has been illustrated in a simple experiment which relates to the concept of an auditory time constant. Our Swedish prosody modules have ensured a high quality in synthesis and a robustness in performance with respect to uncertainties in text parsing. Modifications for English and French prosody have provided promising results.

  • 241.
    Fant, Gunnar
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Kruckenberg, Anita
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Prosody by rule in Swedish with Language Universal Implications2004Ingår i: Proc of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, s. 405-408Konferensbidrag (Refereegranskat)
    Abstract [en]

    The FK text-to-speech prosody rules for Swedish are outlined. They cover all levels including prosodic grouping from syntactical analysis. It is a superposition system with local accentuations superimposed on modular F0 patterns of specific rise and decay patterns in successive prosodic groups. F0 in semitones and segmental durations are calculated as a function of lexically determined prominence and position. Speaker normalisation in frequency and time allow the pooling of male and female data in the analysis stage. The main architecture has been successfully tested in French and English synthesis.

  • 242.
    Fant, Gunnar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Kruckenberg, Anita
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    The FK prosody system2006Ingår i: TMH-QPSR, ISSN 1104-5787, Vol. 48, nr 1, s. 001-003Artikel i tidskrift (Övrigt vetenskapligt)
  • 243. Fober, D.
    et al.
    Letz, S.
    Orlarey, Y.
    Askenfelt, Anders
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Hansen, Kjetil Falkenberg
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Schoonderwaldt, Erwin
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    IMUTUS: an interactive music tuition system2004Ingår i: Proc. of the Sound and Music Computing Conference (SMC 04), October 20-22, 2004, IRCAM, Paris, France, 2004, s. 97-103Konferensbidrag (Övrigt vetenskapligt)
  • 244.
    Forsell, Mimmi
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Laukka, P.
    Acoustic correlates of frustration in spontaneous speech2007Ingår i: TMH-QPSR, ISSN 1104-5787, Vol. 50, nr 1, s. 37-40Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    The focus of this master’s thesis by the first author was to investigate the acoustic attributes of frustration in spontaneous speech. The speech material was recorded from real life Swedish telephone services by the company Voice Provider. The utterances were selected speaker by speaker in order to have at least one of them judged as emotionally neutral by a listener group, while the other utterances of the same speaker were judged as displaying emotional speech. Due to the nature of the speech material most of it was spoken in a neutral way. However, some percent of the utterances displayed various degrees of frustration, mostly anger but also some despondency, and these were the emotions studied in this report. We also studied the emotional intensity of the utterances. Acoustic cues of the emotional speech were compared to those of neutral speech for the same speaker. We found some significant differences between the acoustic cues for neutral and emotional speech. Anger was characterized by a rise of fundamental frequency and an increase in speech amplitude, whereas despondency reduced the syllable rate significantly. The emotional intensity raised the pitch, increased the amplitude and decreased the syllable rate. Correlations were also found between perceived emotions and acoustic speech parameters.

  • 245.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    A fuzzy analyzer of emotional expression in music performance and body motion2005Ingår i: Proceedings of Music and Music Science, Stockholm 2004 / [ed] Brunson, W.; Sundberg, J., 2005Konferensbidrag (Refereegranskat)
  • 246.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Digital audio emotions: An overview of computer analysis and synthesis of emotions in music2008Ingår i: Proc. of the 11th Int. Conference on Digital Audio Effects (DAFx-08), Espoo, Finland, 2008, s. 1-6Konferensbidrag (Refereegranskat)
    Abstract [en]

    The research in emotions and music has increased substantially recently. Emotional expression is one of the most important aspects of music and has been shown to be reliably communicated to the listener given a restricted set of emotion categories. From the results it is evident that automatic analysis and synthesis systems can be constructed. In this paper general aspects are discussed with respect to analysis and synthesis of emotional expression and prototype applications are described.

  • 247.
    Friberg, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Hedblad, Anton
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    A Comparison of Perceptual Ratings and Computed Audio Features2011Ingår i: Proceedings of the SMC 2011 - 8th Sound and Music Computing Conference, 2011, s. 122-127Konferensbidrag (Refereegranskat)
  • 248.
    Friberg, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Källblad, Anna
    Studio 323 wip:sthlm.
    Experiences from video-controlled sound installations2011Ingår i: Proceedings of New Interfaces for Musical Expression - NIME, Oslo, 2011, 2011, s. 128-131Konferensbidrag (Refereegranskat)
  • 249.
    Friberg, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Schoonderwaldt, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Hedblad, Anton
    Perceptual ratings of musical parameters2011Ingår i: Gemessene Interpretation - Computergestützte Aufführungsanalyse im Kreuzverhör der Disziplinen / [ed] von Loesch, H.; Weinzierl, S., Mainz: Schott 2011, (Klang und Begriff 4) , 2011, s. 237-253Kapitel i bok, del av antologi (Refereegranskat)
  • 250.
    Friberg, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Schoonderwaldt, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik. Hanover University, Germany .
    Hedblad, Anton
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Fabiani, Marco
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Elowsson, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Using listener-based perceptual features as intermediate representations in music information retrieval2014Ingår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 136, nr 4, s. 1951-1963Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a "brute force" model using a large number of general audio features.

2345678 201 - 250 av 692
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf