Change search
Refine search result
12345 101 - 150 of 204
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 101.
    Granqvist, Svante
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Lindestad, Per-Åke
    A method of applying Fourier analysis to high-speed laryngoscopy2001In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 110, no 6, p. 3193-3197Article in journal (Refereed)
    Abstract [en]

    A new method for analysis of digital high-speed recordings of vocal-fold vibrations is presented. The method is based on the extraction of light-intensity time sequences from consecutive images, which in turn are Fourier transformed. The spectra thus acquired can be displayed in four different modes, each having its own benefits. When applied to the larynx, the method visualizes oscillations in the entire laryngeal area, not merely the glottal region. The method was applied to two laryngoscopic high-speed image sequences. Among these examples, covibrations in the ventricular folds and in the mucosa covering the arytenoid cartilages were found. In some cases the covibrations occurred at other frequencies than those of the glottis.

  • 102.
    Granström, Björn
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Audiovisual representation of prosody in expressive speech communication2004In: Proc of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, p. 393-396Conference paper (Refereed)
  • 103.
    Gustafson, Joakim
    KTH, Superseded Departments, Speech, Music and Hearing.
    Developing Multimodal Spoken Dialogue Systems: Empirical Studies of Spoken Human–Computer Interaction2002Doctoral thesis, comprehensive summary (Other scientific)
    Abstract [en]

    This thesis presents work done during the last ten years on developing five multimodal spoken dialogue systems, and the empirical user studies that have been conducted with them. The dialogue systems have been multimodal, giving information both verbally with animated talking characters and graphically on maps and in text tables. To be able to study a wider rage of user behaviour each new system has been in a new domain and with a new set of interactional abilities. The five system presented in this thesis are: The Waxholm system where users could ask about the boat traffic in the Stockholm archipelago; the Gulan system where people could retrieve information from the Yellow pages of Stockholm; the August system which was a publicly available system where people could get information about the author Strindberg, KTH and Stockholm; the AdAptsystem that allowed users to browse apartments for sale in Stockholm and the Pixie system where users could help ananimated agent to fix things in a visionary apartment publicly available at the Telecom museum in Stockholm. Some of the dialogue systems have been used in controlled experiments in laboratory environments, while others have been placed inpublic environments where members of the general public have interacted with them. All spoken human-computer interactions have been transcribed and analyzed to increase our understanding of how people interact verbally with computers, and to obtain knowledge on how spoken dialogue systems canutilize the regularities found in these interactions. This thesis summarizes the experiences from building these five dialogue systems and presents some of the findings from the analyses of the collected dialogue corpora.

  • 104.
    Gustafson, Joakim
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Bell, Linda
    KTH, Superseded Departments, Speech, Music and Hearing.
    Speech technology on trial: Experiences from the August system2000In: Natural Language Engineering, ISSN 1351-3249, E-ISSN 1469-8110, Vol. 6, no 3-4, p. 273-286Article in journal (Refereed)
    Abstract [en]

    In this paper, the August spoken dialogue system is described. This experimental Swedish dialogue system, which featured an animated talking agent, was exposed to the general public during a trial period of six months. The construction of the system was partly motivated by the need to collect genuine speech data from people with little or no previous experience of spoken dialogue systems. A corpus of more than 10,000 utterances of spontaneous computer- directed speech was collected and empirical linguistic analyses were carried out. Acoustical, lexical and syntactical aspects of this data were examined. In particular, user behavior and user adaptation during error resolution were emphasized. Repetitive sequences in the database were analyzed in detail. Results suggest that computer-directed speech during error resolution is increased in duration, hyperarticulated and contains inserted pauses. Design decisions which may have influenced how the users behaved when they interacted with August are discussed and implications for the development of future systems are outlined.

  • 105.
    Gustafson, Joakim
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Bell, Linda
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Johan, Boye
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Edlund, Jens
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Wirn, Mats
    Constraint Manipulation and Visualization in a Multimodal Dialogue System2002In: Proceedings of MultiModal Dialogue in Mobile Environments, 2002Conference paper (Refereed)
  • 106.
    Gustafson, Joakim
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Larsson, Anette
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Carlson, Rolf
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Hellman, K
    Department of Linguistics, Stockholm University, S-106 91 Stockholm, Sweden.
    How do System Questions Influence Lexical Choices in User Answers?1997In: Proceedings of Eurospeech '97, 5th European Conference on Speech Communication and Technology : Rhodes, Greece, 22 - 25 September 1997, Grenoble: European Speech Communication Association (ESCA) , 1997, p. 2275-2278Conference paper (Refereed)
    Abstract [en]

    This paper describes some studies on the effect of the system vocabulary on the lexical choices of the users. There are many theories about human-human dialogues that could be useful in the design of spoken dialoguesystems. This paper will give an overview of some of these theories and report the results from two experiments that examines one of these theories, namely lexical entrainment. The first experiment was a small Wizard of Oz-test that simulated a tourist informationsystem with a speech interface, and the second experiment simulated a system with speech recognition that controlled a questionnaire about peoples plans for their vacation. Both experiments show that the subjects mostly adapt their lexical choices to the system questions. Only in less than 5% of the cases did they use an alternative main verb in the answer. These results encourage us to investigate the possibility to add anadaptive language model in the speech recognizer in our dialogue system, where the probabilities for the words used in the system questions are increased.

  • 107.
    Gustafson, Joakim
    et al.
    Voice Technologies, Expert Functions, TeliaSonera, Farsta, Sweden.
    Sjölander, Kåre
    KTH, Superseded Departments, Speech, Music and Hearing.
    Voice creations for conversational fairy-tale characters2004In: Proc 5th ISCA speech synthesis workshop, Pittsburgh, 2004, p. 145-150Conference paper (Refereed)
  • 108.
    Gustafson, Joakim
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing. Telia Research AB, Sweden.
    Sjölander, Kåre
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Voice Transformations For Improving Children's Speech Recognition In A Publicly Available Dialogue System2002In: Proceedings of ICSLP 02, International Speech Communication Association , 2002, p. 297-300Conference paper (Refereed)
    Abstract [en]

    To be able to build acoustic models for children, that can beused in spoken dialogue systems, speech data has to be collected. Commercial recognizers available for Swedish are trained on adult speech, which makes them less suitable for children’s computer-directed speech. This paper describes some experiments with on-the-fly voice transformation of children’s speech. Two transformation methods were tested, one inspired by the Phase Vocoder algorithm and another by the Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA)algorithm. The speech signal is transformed before being sent to the speech recognizer for adult speech. Our results show that this method reduces the error rates in the order of thirty to fortyfive percent for children users.

  • 109.
    Hansen, Kjetil Falkenberg
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, Superseded Departments, Speech, Music and Hearing.
    Analysis of a genuine scratch performance2004In: Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349, Vol. 2915, p. 477-478Article in journal (Refereed)
    Abstract [en]

    The art form of manipulating vinyl records done by disc jockeys (DJs) is called scratching, and has become very popular since its start in the seventies. Since then turntables are commonly used as expressive musical instruments in several musical genres. This phenomenon has had a serious impact on the instrument-making industry, as the sales of turntables and related equipment have boosted. Despite of this, the acoustics of scratching has been barely studied until now. In this paper, we illustrate the complexity of scratching by measuring the gestures of one DJ during a performance. The analysis of these measurements is important to consider in the design of a scratch model.

  • 110.
    Heldner, Mattias
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Strangert, E.
    Temporal effects of focus in Swedish2001In: Journal of Phonetics, ISSN 0095-4470, E-ISSN 1095-8576, Vol. 29, no 3, p. 329-361Article in journal (Refereed)
    Abstract [en]

    The four experiments reported concern the amount and domain of lengthening associated with focal accents in Swedish. Word, syllable and segment durations were measured in read sentences with focus in different positions. As expected, words with focal accents were longer than nonfocal words in general, but the amount of lengthening varied greatly, primarily due to speaker differences but also to position in the phrase and the word accent distinction. Most of the lengthening occurred within the stressed syllable. An analysis of the internal structure of stressed syllables showed that the phonologically long segments-whether vowels or consonants-were lengthened most, while the phonologically short vowels were hardly affected at all. Through this nonlinear lengthening, the contrast between long and short vowels in stressed syllables was sharpened in focus. Thus, the domain of focal accent lengthening includes at least the stressed syllable. Also, an unstressed syllable immediately to the right of the stressed one was lengthened in focus, while initial unstressed syllables, as well as unstressed syllables to the right of the first unstressed one, were not lengthened. Thus, we assume the domain of focal accent lengthening in Swedish to be restricted to the stressed syllable and the immediately following unstressed one.

  • 111.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Final rises and Swedish question intonation2004In: Proc of The XVIIth Swedish Phonetics Conference, Fonetik 2004, Stockholm University, 2004, p. 56-59Conference paper (Other academic)
    Abstract [en]

    Phrase-final intonation was analysed in a subcorpusof Swedish computer-directed questionutterances with the objective of investigatingthe extent to which final rises occur in spontaneousquestions, and also to see if such risesmight have pragmatic functions over and beyondthe signalling of interrogative mode. Finalrises occurred in 22 percent of the utterances.Final rises occurred mostly in conjunctionwith final focal accent. Children exhibitedthe largest percentage of final rises (32%), withwomen second (27%) and men lowest (17%).These results are viewed in relationship to resultsof related perception studies and are discussedin terms of Swedish question intonationand the pragmatic social function of rises in abiological account of intonation.

  • 112.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Final rises in spontaneous Swedish computer-directed questions: incidence and function2004In: Proc of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, p. 115-118Conference paper (Refereed)
    Abstract [en]

    Phrase-final intonation was analysed in a subcorpus of Swedish computer-directed question utterances with the objective of investigating the extent to which final rises occur in spontaneous questions, and also to see if such rises might have pragmatic functions over and beyond the signalling of interrogative mode. Final rises occurred in 22 percent of the utterances. Final rises occurred mostly in conjunction with final focal accent. Children exhibited the largest percentage of final rises (32%), with women second (27%) and men lowest (17%). These results are discussed in terms of Swedish question intonation and the pragmatic social function of rises in a biological account of intonation.

  • 113.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Pitch and alignment in the perception of tone and intonation2004In: From Traditional Phonology to Modern Speech Processing / [ed] Fant, G.; Fujisaki, H.; Cao, J.; Xu, Y., Beijing: Foreign Language Teaching and Research Press , 2004, p. 189-204Chapter in book (Refereed)
  • 114.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Pitch and alignment in the perception of tone and intonation: pragmatic signals and biological codes2004In: Proc of International Symposium on Tonal Aspects of Languages: Emphasis on Tone Languages / [ed] Bel, B.; Marlein, I., Beijng, China, 2004, p. 93-96Conference paper (Refereed)
  • 115.
    Jande, Per-Anders
    KTH, Superseded Departments, Speech, Music and Hearing.
    Pronunciation variation modelling using decision tree induction from multiple linguistic parameters2004In: Proceedings of Fonetik, Stockholm, Sweden, 2004, p. 12-15Conference paper (Other academic)
  • 116.
    Jansson, Erik V.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Violin frequency response: bridge mobility and bridge feet distance2004In: Applied Acoustics, ISSN 0003-682X, E-ISSN 1872-910X, Vol. 65, no 12, p. 1197-1205Article in journal (Refereed)
    Abstract [en]

    Good violins have a broad hill in the 2-3 kHz range of their frequency response. This hill has previously been attributed to the first in-plane resonance of the violin bridge. Experiments prove, however, that the hill is the result of two forces acting in opposite directions at the bridge feet. The experiments reported here show that the hill can be "tuned" by altering the distance between the bridge feet. It can be tuned both in terms of frequency and level but the properties of the violin cannot be neglected.

  • 117.
    Jonell, Patrik
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Kucherenko, Taras
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, Perception and Learning, RPL.
    Ekstedt, Erik
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Beskow, Jonas
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Learning Non-verbal Behavior for a Social Robot from YouTube Videos2019Conference paper (Refereed)
    Abstract [en]

    Non-verbal behavior is crucial for positive perception of humanoid robots. If modeled well it can improve the interaction and leave the user with a positive experience, on the other hand, if it is modelled poorly it may impede the interaction and become a source of distraction. Most of the existing work on modeling non-verbal behavior show limited variability due to the fact that the models employed are deterministic and the generated motion can be perceived as repetitive and predictable. In this paper, we present a novel method for generation of a limited set of facial expressions and head movements, based on a probabilistic generative deep learning architecture called Glow. We have implemented a workflow which takes videos directly from YouTube, extracts relevant features, and trains a model that generates gestures that can be realized in a robot without any post processing. A user study was conducted and illustrated the importance of having any kind of non-verbal behavior while most differences between the ground truth, the proposed method, and a random control were not significant (however, the differences that were significant were in favor of the proposed method).

  • 118.
    Juslin, P. N.
    et al.
    Uppsala University.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Bresin, R.
    Computational modelling of different aspects of expressivity: The GERM model2002In: Proceedings of ICMPC7 - 7th International Conference on Music Perception & Cognition, 2002, p. 13-Conference paper (Refereed)
  • 119. Juslin, Patrik N.
    et al.
    Friberg, Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Schoonderwaldt, Erwin
    KTH, Superseded Departments, Speech, Music and Hearing.
    Karlsson, Jessica
    Feedback learning of musical expressivity2004In: Musical excellence - Strategies and techniques to enhance performance / [ed] Aaron Williamon, Oxford University Press, 2004, p. 247-270Chapter in book (Refereed)
    Abstract [en]

    Communication of emotion is of fundamental importance to the performance of music. However, recent research indicates that expressive aspects of performance are neglected in music education, with teachers spending more time and effort on technical aspects. Moreover, traditional strategies for teaching expressivity rarely provide informative feedback to the performer. In this chapter we explore the nature of expressivity in music performance and evaluate novel methods for teaching expressivity based on recent advances in musical science, psychology, technology, and acoustics. First, we provide a critical discussion of traditional views on expressivity, and dispel some of the myths that surround the concept of expressivity. Then, we propose a revised view of expressivity based on modern research. Finally, a new and empirically based approach to learning expressivity termed cognitive feedback is described and evaluated. The goal of cognitive feedback is to allow the performer to compare a model of his or her playing to an “optimal” model based on listeners’ judgments of expressivity. This method is being implemented in user-friendly software, which is evaluated in collaboration with musicians and music teachers.

  • 120.
    Karlsson, Inger A.
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Banziger, T.
    Dankovicova, J.
    Johnstone, T.
    Lindberg, J.
    Melin, H.
    Nolan, F.
    Scherer, K.
    Speaker verification with elicited speaking styles in the VeriVox project2000In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 31, no 03-feb, p. 121-129Article in journal (Refereed)
    Abstract [en]

    Some experiments have been carried out to study and compensate for within-speaker variations in speaker verification. To induce speaker variation, a speaking behaviour elicitation software package has been developed. A 50-speaker database with voluntary and involuntary speech variation has been recorded using this software. The database has been used for acoustic analysis as well as for automatic speaker verification (ASV) tests. The voluntary speech variations are used to form an enrolment set for the ASV system. This set is called structured training and is compared to neutral training where only normal speech is used. Both sets contain the same number of utterances. It is found that the ASV system improves its performance when testing on a mixed speaking style test without decreasing the performance of the tests with normal speech.

  • 121.
    Karlsson, Inger
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Faulkner, Andrew
    Salvi, Giampiero
    KTH, Superseded Departments, Speech, Music and Hearing.
    SYNFACE - a talking face telephone2003In: Proceedings of EUROSPEECH 2003, 2003, p. 1297-1300Conference paper (Refereed)
    Abstract [en]

    The SYNFACE project has as its primary goal to facilitate for hearing-impaired people to use an ordinary telephone. This will be achieved by using a talking face connected to the telephone. The incoming speech signal will govern the speech movements of the talking face, hence the talking face will provide lip-reading support for the user.The project will define the visual speech information that supports lip-reading, and develop techniques to derive this information from the acoustic speech signal in near real time for three different languages: Dutch, English and Swedish. This requires the development of automatic speech recognition methods that detect information in the acoustic signal that correlates with the speech movements. This information will govern the speech movements in a synthetic face and synchronise them with the acoustic speech signal. A prototype system is being constructed. The prototype contains results achieved so far in SYNFACE. This system will be tested and evaluated for the three languages by hearing-impaired users. SYNFACE is an IST project (IST-2001-33327) with partners from the Netherlands, UK and Sweden. SYNFACE builds on experiences gained in the Swedish Teleface project.

  • 122.
    Karnebäck, Stefan
    KTH, Superseded Departments, Speech, Music and Hearing.
    Spectro-temporal properties of the acoustic speech signal used for speech/music discrimination2004Licentiate thesis, comprehensive summary (Other scientific)
  • 123.
    Karnebäck, Stefan
    KTH, Superseded Departments, Speech, Music and Hearing.
    Speech/music discrimination using discrete hidden Markov models2004Report (Other academic)
  • 124. Lacerda, F.
    et al.
    Sundberg, U.
    Carlson, Rolf
    KTH, Superseded Departments, Speech, Music and Hearing.
    Holt, L.
    Modelling interactive language learning: a project presentation2004In: Proceedings of The XVIIth Swedish Phonetics Conference, Fonetik 2004, Stockholm University, 2004, p. 60-63Conference paper (Other academic)
    Abstract [en]

    This paper describes a recently started inter-disciplinary research program aiming at inves-tigating and modelling fundamental aspects of the language acquisition process. The working hypothesis assumes that general purpose per-ception and memory processes, common to both human and other mammalian species, along with the particular context of initial adult-infant interaction, underlie the infant’s ability to progressively derive linguistic struc-ture implicitly available in the ambient lan-guage. The project is conceived as an interdis-ciplinary research effort involving the areas of Phonetics, Psychology and Speech recognition. Experimental speech perception techniques will be used at Dept. of Linguistics, SU, to investi-gate the development of the infant’s ability to derive linguistic information from situated con-nected speech. These experiments will be matched by behavioural tests of animal sub-jects, carried out at CMU, Pittsburgh, to dis-close the potential significance that recurrent multi-sensory properties of the stimuli may have for spontaneous category formation. Data from infant and child vocal productions as well as infant-adult interactions will also be col-lected and analyzed to address the possibility of a production-perception link. Finally, the data from the infant and animal studies will be inte-grated and tested in mathematical models of the language acquisition process, developed at TMH, KTH.

  • 125.
    Magnuson, Tina
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Hunnicutt, Sheri
    KTH, Superseded Departments, Speech, Music and Hearing.
    Aided text construction in an e-mail application for symbol users2004In: Proceedings of the 11th Biennial Conference of the International Society for Augmentative and Alternative Communication, Natal, Brazil, 2004Conference paper (Refereed)
  • 126. Massaro, Dominic W.
    et al.
    Beskow, Jonas
    KTH, Superseded Departments, Speech, Music and Hearing.
    Cohen, Michael M.
    Fry, Christopher L.
    Rodriguez, Tony
    Picture My Voice: Audio to Visual Speech Synthesis using Artificial Neural Networks1999In: Proceedings of International Conference on Auditory-Visual Speech Processing / [ed] Massaro, Dominic W., 1999, p. 133-138Conference paper (Other academic)
    Abstract [en]

    This paper presents an initial implementation and evaluation  of  a  system  that  synthesizes  visualspeech  directly  from  the  acoustic waveform. Anartificial  neural  network  (ANN)  was  trained  tomap  the  cepstral  coefficients  of  an  individual’snatural  speech  to  the  control  parameters  of  ananimated  synthetic  talking  head. We  trained  ontwo data sets; one was a set of 400 words spokenin  isolation  by  a  single  speaker  and  the  other  a subset  of  extemporaneous  speech  from  10different speakers. The system showed learning inboth cases. A perceptual evaluation test indicatedthat the system’s generalization  to new words bythe  same  speaker  provides  significant  visible information, but significantly below that given bya text-to-speech algorithm.

  • 127. Mathews, M. V.
    et al.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Bennett, G.
    Sapp, C.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    A marriage of the Director Musices program and the conductor program2003In: Proceedings of the Stockholm Music Acoustics Conference, August 6-9, 2003 (SMAC 03), Stockholm, Sweden, 2003, Vol. 1, p. 13-16Conference paper (Refereed)
    Abstract [en]

    This paper will describe an ongoing collaboration between the authors to combine the Director Musices and Conductor programs in order to achieve a more expressive and socially interactive performance of a midi file score by an electronic orchestra. Director Musices processes a “square” midi file, adjusting the dynamics and timing of the notes to achieve the expressive performance of a trained musician. The Conductor program and the Radio-baton allow a conductor, wielding an electronic baton, to follow and synchronize with other musicians, for example to provide an orchestral accompaniment to an operatic singer. These programs may be particularly useful for student soloists who wish to practice concertos with orchestral accompaniments. 

  • 128. McAllister, Anita
    et al.
    Sederholm, E
    Ternström, Sten
    KTH, Superseded Departments, Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments, Speech, Music and Hearing.
    Perturbation and hoarseness: a pilot study of six children's voices.1996In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 10, no 3Article in journal (Refereed)
    Abstract [en]

    Fundamental frequency (FO) perturbation has been found to be useful as an acoustic correlate of the perception of dysphonia in adult voices. In a previous investigation, we showed that hoarseness in children's voices is a stable concept composed mainly of three predictors: hyperfunction, breathiness, and roughness. In the present investigation, the relation between FO perturbation and hoarseness as well as its predictors was analyzed in running speech of six children representing different degrees of hoarseness. Two perturbation measures were used: the standard deviation of the distribution of perturbation data and the mean of the absolute value of perturbation. The results revealed no clear relation.

  • 129. Mürbe, D.
    et al.
    Pabst, F.
    Hofmann, G.
    Sundberg, Johan
    KTH, Superseded Departments, Speech, Music and Hearing.
    Effects of a professional solo singer education on auditory and kinesthetic feedback - A longitudinal study of singers' pitch control2004In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 18, no 2, p. 236-241Article in journal (Refereed)
    Abstract [en]

    The significance of auditory and kinesthetic feedback to pitch control in singing was described in a previous report of this project for students at the beginning of their professional solo singer education.(1) As it seems reasonable to assume that pitch control can be improved by training, the same students were reinvestigated after 3 years of professional singing education. As in the previous study, the singers sang an ascending and descending triad pattern with and without masking noise in legato and staccato and in a slow and a fast tempo. Fundamental frequency and interval sizes between adjacent tones were determined and compared with their equivalents in the equally tempered tuning. The average deviations from these values were used as estimates of intonation accuracy. Intonation accuracy was reduced by masking noise, by staccato as opposed to legato singing, and by fast as opposed to slow performance. The contribution of the auditory feedback to pitch control was not significantly improved after education, whereas the kinesthetic feedback circuit was improved in slow legato and slow staccato tasks. The results support the assumption that the kinesthetic feedback contributes substantially to intonation accuracy.

  • 130.
    Nilsson, Mattias
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Kleijn, Bastiaan
    KTH, Superseded Departments, Speech, Music and Hearing.
    Gustafsson, H.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Andersen, Sören Vang
    KTH, Superseded Departments, Speech, Music and Hearing.
    Gaussian Mixture Model based Mutual Information Estimation between Frequency Bands in Speech2002In: 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, p. 525-528Conference paper (Refereed)
    Abstract [en]

    In this paper, we investigate the dependency between the spectral envelopes of speech in disjoint frequency bands, one covering the telephone bandwidth from 0.3 kHz to 3.4 kHz and one covering the frequencies from 3.7 kHz to 8 kHz. The spectral envelopes are jointly modeled with a Gaussian mixture model based on mel-frequency cepstral coefficients and the log-energy-ratio of the disjoint frequency bands. Using this model, we quantify the dependency between bands through their mutual information and the perceived entropy of the high frequency band. Our results indicate that the mutual information is only a small fraction of the perceived entropy of the high band. This suggests that speech bandwidth extension should not rely only on mutual information between narrow- and high-band spectra. Rather, such methods need to make use of perceptual properties to ensure that the extended signal sounds pleasant.

  • 131.
    Nordenberg, Maria
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Effect on LTAS of vocal loudness variation2004In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 29, no 4, p. 183-191Article in journal (Refereed)
    Abstract [en]

    Long-term-average spectrum (LTAS) is an efficient method for voice analysis, revealing both voice source and formant characteristics. However, the LTAS contour is non-uniformly affected by vocal loudness. This variation was analyzed in 15 male and 16 female untrained voices reading a text 7 times at different degrees of vocal loudness, mean change in overall equivalent sound level (Leq) amounting to 27.9 dB and 28.4 dB for the female and male subjects. For all frequency values up to 4 kHz, spectrum level was strongly and linearly correlated with Leq for each subject. The gain factor, that is to say, the rate of level increase, varied with frequency, from about 0.5 at low frequencies to about 1.5 in the frequency range 1.5-3 kHz. Using the gain factors for a subject, LTAS contours could be predicted at any Leq within the measured range, with an average accuracy of 2-3 dB below 4 kHz. Mean LTAS calculated for an Leq of 70 dB for each subject showed considerable individual variation for both males and females, SD of the level varying between 7 dB and 4 dB depending on frequency. On the other hand, the results also suggest that meaningful comparisons of LTAS, recorded for example before and after voice therapy, can be made, provided that the documentation includes a set of recordings at different loudness levels from one recording session.

  • 132.
    Nordqvist, Peter
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Leijon, Arne
    KTH, Superseded Departments, Speech, Music and Hearing.
    An efficient robust sound classification algorithm for hearing aids2004In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 115, no 6, p. 3033-3041Article in journal (Refereed)
    Abstract [en]

    An efficient robust sound classification algorithm based on hidden Markov models is presented. The system would enable a hearing aid to automatically change its behavior for differing listening environments according to the user's preferences. This work attempts to distinguish between three listening environment categories: speech in traffic noise, speech in babble, and clean speech, regardless of the signal-to-noise ratio. The classifier uses only the modulation characteristics of the signal. The classifier ignores the absolute sound pressure level and the absolute spectrum shape, resulting in an algorithm that is robust against irrelevant acoustic variations. The measured classification hit rate was 96.7%-99.5% when the classifier was tested with sounds representing one of the three environment categories included in the classifier. False-alarm rates were 0.2%-1.7% in these tests. The algorithm is robust and efficient and consumes a small amount of instructions and memory. It is fully possible to implement-the classifier in a DSP-based hearing instrument.

  • 133.
    Nordqvist, Peter
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Leijon, Arne
    KTH, Superseded Departments, Speech, Music and Hearing.
    Hearing-aid automatic gain control adapting to two sound sources in the environment, using three time constants2004In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 116, no 5, p. 3152-3155Article in journal (Refereed)
    Abstract [en]

    A hearing aid AGC algorithm is presented that uses a richer representation of the sound environment than previous algorithms. The proposed algorithm is designed to (1) adapt slowly (in approximately 10 s) between different listening environments, e.g., when the user leaves a single talker lecture for a multi-babble coffee-break; (2) switch rapidly (about 100 ms) between different dominant sound sources within one listening situation, such as the change from the user's own voice to a distant speaker's voice in a quiet conference room; (3) instantly reduce gain for strong transient sounds and then quickly return to the previous gain setting; and (4) not change the gain in silent pauses but instead keep the gain setting of the previous sound source. An acoustic evaluation showed that the algorithm worked as intended. The algorithm was evaluated together with a reference algorithm in 4 pilot field test. When evaluated by nine users in a set of speech recognition tests, the algorithm showed similar results to the reference algorithm.

  • 134.
    Nordstrand, Magnus
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Svanfeldt, Gunilla
    KTH, Superseded Departments, Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments, Speech, Music and Hearing.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Measurements of articulatory variation in expressive speech for set of Swedish vowels2004In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 44, no 1-4, p. 187-196Article in journal (Refereed)
    Abstract [en]

    Facial gestures are used to convey e.g. emotions, dialogue states and conversational signals, which support us in the interpretation of other people's feelings and intentions. Synthesising this behaviour with an animated talking head would widen the possibilities of this intuitive interface. The dynamic characteristics of these facial gestures during speech affect articulation. Previously, articulation for neutral speech has been studied and implemented in animation rules. The results obtained in this study show how some articulatory parameters are affected by the influence of expressiveness in speech for a selection of Swedish vowels. Our focus has primarily been on attitudes and emotions conveying information that is intended to make an animated agent more "human-like". A multimodal corpus of acted expressive speech has been collected for this purpose.

  • 135.
    Pakucs, Botond
    KTH, Superseded Departments, Speech, Music and Hearing.
    Butler: A universal speech interface for mobile environments2004In: MOBILE HUMAN-COMPUTER INTERACTION: MOBILEHCI 2004, PROCEEDINGS / [ed] Brewster, S; Dunlop, M, BERLIN: SPRINGER , 2004, Vol. 3160, p. 399-403Conference paper (Refereed)
    Abstract [en]

    Speech interfaces are about to be integrated in consumer appliances and embedded systems and are expected to be used by mobile users in ubiquitous computing environments. This paper discusses some major usability and HCI related problems that may be introduced by this development. It is argued that a human-centered approach should be employed when designing and developing speech interfaces for mobile environments. Further, the Butler, a generic spoken dialogue system developed according to the human-centered approach is described. The Butler features a dynamic multi-domain approach.

  • 136.
    Pakucs, Botond
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Huhta, S.
    Developing speech interfaces for frequent users: the DUMAS-calendar prototype2004In: COLING 2004 Satellite Workshop: Robust and Adaptive Information Processing for Mobile Speech Interfaces / [ed] Björn Gambäck, Kristiina Jokinen, 2004, p. 65-68Conference paper (Refereed)
  • 137.
    Rinman, Marie Louise
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Friberg, Anders
    KTH, Superseded Departments, Speech, Music and Hearing.
    Bendiksen, B.
    Cirotteau, D.
    Dahl, Sofia
    KTH, Superseded Departments, Speech, Music and Hearing.
    Kjellmo, I.
    Mazzarino, B.
    Camurri, A.
    Ghost in the Cave: an interactive collaborative game using non-verbal communication2004In: GESTURE-BASED COMMUNICATION IN HUMAN-COMPUTER INTERACTION / [ed] Camurri, A; Volpe, G, Berlin: Springer Verlag , 2004, p. 549-556Conference paper (Refereed)
    Abstract [en]

    The interactive game environment, Ghost in the Cave, presented in this short paper, is a work still in progress. The game involves participants in an activity using non-verbal emotional expressions. Two teams use expressive gestures in either voice or body movements to compete. Each team has an avatar controlled either by singing into a microphone or by moving in front of a video camera. Participants/players control their avatars by using acoustical or motion cues. The avatar is navigated in a 3D distributed virtual environment using the Octagon server and player system. The voice input is processed using a musical cue analysis module yielding performance variables such as tempo, sound level and articulation as well as an emotional prediction. Similarly, movements captured from a video camera are analyzed in terms of different movement cues. The target group is young teenagers and the main purpose to encourage creative expressions through new forms of collaboration.

  • 138. Rinman, M-L
    et al.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Kjellmo, I.
    Camurri, A.
    Cirotteau, D.
    Dahl, S.
    Mazzarino, B.
    Bendiksen, B.
    McCarthy, H.
    EPS - an interactive collaborative game using non-verbal communication2003In: Proceedings of the Stockholm Music Acoustics Conference, August 6-9, 2003 (SMAC 03), Stockholm, Sweden / [ed] Bresin, R., 2003, Vol. 2, p. 561-563Conference paper (Refereed)
    Abstract [en]

    The interactive game environment EPS (expressive performance space), presented in this short paper, is a work still in progress. EPS involves participants in an activity using non-verbal emotional expressions. Two teams use expressive gestures in either voice or body movements to compete. Each team has an avatar controlled either by singing into a microphone or by moving in front of a video camera. Participants/players control their avatars by using acoustical or motion cues. The avatar is navigated /moving around in a 3D distributed virtual environment using the Octagon server and player system. The voice input is processed using a musical cue analysis module yielding performance variables such as tempo, sound level and articulation as well as an emotional prediction. Similarly, movements captured from the video camera are analyzed in terms of different movement cues. The target group is children aged 13- 16 and the purpose is to elaborate new forms of collaboration.

  • 139. Rocchesso, Davide
    et al.
    Avanzini, Federico
    Rath, Matthias
    Bresin, Roberto
    KTH, Superseded Departments, Speech, Music and Hearing.
    Serafin, Stefania
    Contact sounds for continuous feedback2004In: Proceedings of International Workshop on Interactive Sonification: (Human Interaction with Auditory Displays) / [ed] Hunt, A.; Hermann, T., 2004Conference paper (Refereed)
    Abstract [en]

    The role of continuous auditory feedback in multimodal embodied interfaces is advocated. Examples of physics based cartoon sound models (rolling and friction) are usedto display deviation from equilibrium and exerted effort inmanipulative interfaces.

  • 140. Ross, J.
    et al.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Generative Performance Rules and Folksong Performance2000In: Sixth International Conference on Music Perception and Cognition, Keele, UK, August 2000 / [ed] Woods, C., Luck, G., Brochard, R., Seddon, F., & Sloboda, J. A., 2000Conference paper (Refereed)
  • 141.
    Salvi, Giampiero
    KTH, Superseded Departments, Speech, Music and Hearing.
    Accent clustering in Swedish using the Bhattacharyya distance2003In: Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona Spain, 2003, p. 1149-1152Conference paper (Refereed)
    Abstract [en]

    In an attempt to improve automatic speech recognition(ASR) models for Swedish, accent variations wereconsidered. These have proved to be important variablesin the statistical distribution of the acoustic featuresusually employed in ASR. The analysis of featurevariability have revealed phenomena that are consistentwith what is known from phonetic investigations,suggesting that a consistent part of the informationabout accents could be derived form those features. Agraphical interface has been developed to simplify thevisualization of the geographical distributions of thesephenomena.

  • 142.
    Salvi, Giampiero
    KTH, Superseded Departments, Speech, Music and Hearing.
    Developing acoustic models for automatic speech recognition in swedish1999In: The European Student Journal of Language and Speech, Vol. 1Article in journal (Refereed)
    Abstract [en]

    This thesis is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified task (digits and natural number recognition) has been considered for model evaluation. Different kinds of phone models have been tested, including context independent models and two variations of context dependent models. Furthermore many experiments have been done with bigram language models to tune some of the system parameters. System performance over various speaker subsets with different sex, age and dialect has also been examined. Results are compared to previous similar studies showing a remarkable improvement.

  • 143.
    Salvi, Giampiero
    KTH, Superseded Departments, Speech, Music and Hearing.
    Truncation error and dynamics in very low latency phonetic recognition2003In: Proceedings of Non Linear Speech Processing (NOLISP), 2003Conference paper (Refereed)
    Abstract [en]

    The truncation error for a two-pass decoder is analyzed in a problem of phonetic speech recognition for very demanding latency constraints (look-ahead length < 100ms) and for applications where successive renements of the hypotheses are not allowed. This is done empirically in the framework of hybrid MLP/HMM models. The ability of recurrent MLPs, as a posteriori probability estimators, to model time variations is also considered, and its interaction with the dynamic modeling in the decoding phase is shown in the simulations.

  • 144.
    Salvi, Giampiero
    KTH, Superseded Departments, Speech, Music and Hearing.
    Using accent information in ASR models for Swedish2003In: Proceedings of INTERSPEECH'2003, 2003, p. 2677-2680Conference paper (Refereed)
    Abstract [en]

    In this study accent information is used in an attempt to improve acoustic models for automatic speech recognition (ASR). First, accent dependent Gaussian models were trained independently. The Bhattacharyya distance was then used in conjunction with agglomerative hierarchical clustering to define optimal strategies for merging those models. The resulting allophonic classes were analyzed and compared with the phonetic literature. Finally, accent "aware" models were built, in which the parametric complexity for each phoneme corresponds to the degree of variability across accent areas and to the amount of training data available for it. The models were compared to models with the same, but evenly spread, overall complexity showing in some cases a slight improvement in recognition accuracy.

  • 145.
    Schoonderwaldt, Erwin
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Towards a rule-based model for violin vibrato2001In: Proc of the Workshop on Current Research Directions in Computer Music, 2001, p. 61-64Conference paper (Refereed)
    Abstract [en]

    Vibrato is one of the most important expressive parameters that players can control when rendering a piece of music. The simulation of vibrato, in systems for automatic music performance, is still an open problem. A mere regular periodic modulation of pitch generally yields unsatisfactory results, sounding both unnatural and mechanical. An appropriate control of vibrato rate and vibrato extent is a major requirement of a successful vibrato model. The goal of the present work was to develop a generative, rule-based model for expressive violin vibrato. Measurements of vibrato as performed by professional violinists were used for this purpose. The model generates vibrato rate and extent envelopes, which are used to control a sampled violin synthesizer.

  • 146.
    Schoonderwaldt, Erwin
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Bresin, Roberto
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Juslin, P. N.
    Uppsala University.
    A system for improving the communication of emotion in music performance by feedback learning2002In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 111, no 5, p. 2471-Article in journal (Refereed)
    Abstract [en]

    Expressivity is one of the most important aspects of music performance. However, in music education, expressivity is often overlooked in favor of technical abilities. This could possibly depend on the difficulty in describing expressivity, which makes it problematic to provide the student with specific feedback. The aim of this project is to develop a computer program, which will improve the students’ ability in communicating emotion in music performance. The expressive intention of a performer can be coded in terms of performance parameters (cues), such as tempo, sound level, timbre, and articulation. Listeners’ judgments can be analyzed in the same terms. An algorithm was developed for automatic cue extraction from audio signals. Using note onset–offset detection, the algorithm yields values of sound level, articulation, IOI, and onset velocity for each note. In previous research, Juslin has developed a method for quantitative evaluation of performer–listener communication. This framework forms the basis of the present program. Multiple regression analysis on performances of the same musical fragment, played with different intentions, determines the relative importance of each cue and the consistency of cue utilization. Comparison with built‐in listener models, simulating perceived expression using a regression equation, provides detailed feedback regarding the performers’ cue utilization.

  • 147.
    Schoonderwaldt, Erwin
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Hansen, Kjetil Falkenberg
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Askenfelt, Anders
    KTH, Superseded Departments, Speech, Music and Hearing.
    IMUTUS: an interactive system for learning to play a musical instrument2004In: Proc. of International Conference of Interactive Computer Aided Learning (ICL), September 29 - October 1, 2004, Carinthia Tech Institute, Villach, Austria / [ed] Auer, M.; Auer, U., Kassel University Press GmbH, 2004Conference paper (Other academic)
    Abstract [en]

    IMUTUS (Interactive Music Tuition System) is a EU project that aims to develop a practising environment for the recorder, combining new technologies and new approaches for music learning. Automatic analysis and evaluation of student performances play a central role in the student-system interaction. The performance evaluation module identifies typical performance errors, and provides feedback that relates to performance skills, helping the student to improve. The performance evaluation process is based on the knowledge and experience of recorder teachers, obtained via questionnaires, interviews and structured evaluations of recorded student performances. Another important feature of the performance evaluation is that it can be guided by teachers writing the content for IMUTUS by means of annotations.

  • 148.
    Seward, Alexander
    KTH, Superseded Departments, Speech, Music and Hearing.
    A fast HMM match algorithm for very large vocabulary speech recognition2004In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 42, no 2, p. 191-206Article in journal (Refereed)
    Abstract [en]

    The search over context-dependent continuous density Hidden Markov Models (HMMs), including state-likelihood computations, accounts for a considerable part of the total decoding time for a speech recognizer. This is especially apparent in tasks that incorporate large vocabularies and long-dependency n-gram grammars, since these impose a high degree of context dependency and HMMs have to be treated differently in each context. This paper proposes a strategy for acoustic match of typical continuous density HMMs, decoupled from the main search and conducted as a separate component suited for parallelization. Instead of computing a large amount of probabilities for different alignments of each HMM, the proposed method computes all alignments, but more efficiently. Each HMM is matched only once against any time interval, and thus may be instantly looked up by the main search algorithm as required. In order to accomplish this in real time, a fast time-warping match algorithm is proposed, exploiting the specifics of the 3-state left-to-right HMM topology without skips. In proof-of-concept tests, using a highly optimized SIMD-parallel implementation, the algorithm was able to perform time-synchronous decoupled evaluation of a triphone acoustic model, with maximum phone duration of 40 frames, with a real-time factor of 0.83 on one of the CPUs of a Dual-Xeon 2 GHz workstation. The algorithm was able to compute the likelihood for 636,000 locally optimal HMM paths/second, with full state evaluation.

  • 149.
    Seward, Alexander
    KTH, Superseded Departments, Speech, Music and Hearing.
    A Tree-Trellis N-best Decoder for Stochastic Context-Free Grammars2000In: Proceedings of the International Conference on Spoken Language Processing, Beijing, China, 2000: vol 4, 2000, p. 282-285Conference paper (Other academic)
    Abstract [en]

    In this paper a decoder for continuous speech recognition using stochastic context-free grammars is described. It forms the backbone of the ACE recognizer, which is a modular system for real-time speech recognition. A new rationale for automata is introduced, as well as a new model for pruning the search space.

  • 150.
    Seward, Alexander
    KTH, Superseded Departments, Speech, Music and Hearing.
    Efficient Methods for Automatic Speech Recognition2003Doctoral thesis, comprehensive summary (Other scientific)
    Abstract [en]

    This thesis presents work in the area of automatic speech recognition (ASR). The thesis focuses on methods for increasing the efficiency of speech recognition systems and on techniques for efficient representation of different types of knowledge in the decoding process. In this work, several decoding algorithms and recognition systems have been developed, aimed at various recognition tasks.

    The thesis presents the KTH large vocabulary speech recognition system. The system was developed for online (live) recognition with large vocabularies and complex language models. The system utilizes weighted transducer theory for efficient representation of different knowledge sources, with the purpose of optimizing the recognition process.

    A search algorithm for efficient processing of hidden Markov models (HMMs) is presented. The algorithm is an alternative to the classical Viterbi algorithm for fast computation of shortest paths in HMMs. It is part of a larger decoding strategy aimed at reducing the overall computational complexity in ASR. In this approach, all HMM computations are completely decoupled from the rest of the decoding process. This enables the use of larger vocabularies and more complex language models without an increase of HMM-related computations.

    Ace is another speech recognition system developed within this work. It is a platform aimed at facilitating the development of speech recognizers and new decoding methods.

    A real-time system for low-latency online speech transcription is also presented. The system was developed within a project with the goal of improving the possibilities for hard-of-hearing people to use conventional telephony by providing speech-synchronized multimodal feedback. This work addresses several additional requirements implied by this special recognition task.

12345 101 - 150 of 204
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf