Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Low-Latency Incremental Speech Transcription in the Synface Project
KTH, Superseded Departments, Speech, Music and Hearing.
2003 (English)In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), Geneva, Switzerland, 2003: vol 2, 2003, 1141-1144 p.Conference paper, Published paper (Other academic)
Abstract [en]

In this paper, a real-time decoder for low-latency onlinespeech transcription is presented. The system was developedwithin the Synface project, which aims to improve thepossibilities for hard of hearing people to use conventionaltelephony by providing speech-synchronized multimodalfeedback. This paper addresses the specific issues related toHMM-based incremental phone classification with real-timeconstraints. The decoding algorithm described in this workenables a trade-off to be made between improved recognitionaccuracy and reduced latency. By accepting a longer latencyper output increment, more time can be ascribed tohypothesis look-ahead and by that improve classificationaccuracy. Experiments performed on the Swedish SpeechDatdatabase show that it is possible to generate the sameclassification as is produced by non-incremental decodingusing HTK, by adopting a latency of approx. 150 ms ormore.

Place, publisher, year, edition, pages
2003. 1141-1144 p.
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-24046OAI: oai:DiVA.org:kth-24046DiVA: diva2:342981
Note
QC 20100811Available from: 2010-08-11 Created: 2010-08-11 Last updated: 2010-08-12Bibliographically approved
In thesis
1. Efficient Methods for Automatic Speech Recognition
Open this publication in new window or tab >>Efficient Methods for Automatic Speech Recognition
2003 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

This thesis presents work in the area of automatic speech recognition (ASR). The thesis focuses on methods for increasing the efficiency of speech recognition systems and on techniques for efficient representation of different types of knowledge in the decoding process. In this work, several decoding algorithms and recognition systems have been developed, aimed at various recognition tasks.

The thesis presents the KTH large vocabulary speech recognition system. The system was developed for online (live) recognition with large vocabularies and complex language models. The system utilizes weighted transducer theory for efficient representation of different knowledge sources, with the purpose of optimizing the recognition process.

A search algorithm for efficient processing of hidden Markov models (HMMs) is presented. The algorithm is an alternative to the classical Viterbi algorithm for fast computation of shortest paths in HMMs. It is part of a larger decoding strategy aimed at reducing the overall computational complexity in ASR. In this approach, all HMM computations are completely decoupled from the rest of the decoding process. This enables the use of larger vocabularies and more complex language models without an increase of HMM-related computations.

Ace is another speech recognition system developed within this work. It is a platform aimed at facilitating the development of speech recognizers and new decoding methods.

A real-time system for low-latency online speech transcription is also presented. The system was developed within a project with the goal of improving the possibilities for hard-of-hearing people to use conventional telephony by providing speech-synchronized multimodal feedback. This work addresses several additional requirements implied by this special recognition task.

Place, publisher, year, edition, pages
Stockholm: KTH, 2003. iii, 65 p.
Series
Trita-TMH, ISSN 1104-5787 ; 2003:14
Keyword
speech recognition, algorithms, hidden markov models, HMM, weigted finite-state transducers
Identifiers
urn:nbn:se:kth:diva-3675 (URN)91-7283-657-1 (ISBN)
Public defence
2003-12-17, 00:00
Note
QC 20100811Available from: 2003-12-11 Created: 2003-12-11 Last updated: 2010-08-12Bibliographically approved

Open Access in DiVA

No full text

Other links

Speech, Music and Hearing

Search in DiVA

By author/editor
Seward, Alexander
By organisation
Speech, Music and Hearing
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 25 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf