Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient Methods for Automatic Speech Recognition
KTH, Superseded Departments, Speech, Music and Hearing.
2003 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

This thesis presents work in the area of automatic speech recognition (ASR). The thesis focuses on methods for increasing the efficiency of speech recognition systems and on techniques for efficient representation of different types of knowledge in the decoding process. In this work, several decoding algorithms and recognition systems have been developed, aimed at various recognition tasks.

The thesis presents the KTH large vocabulary speech recognition system. The system was developed for online (live) recognition with large vocabularies and complex language models. The system utilizes weighted transducer theory for efficient representation of different knowledge sources, with the purpose of optimizing the recognition process.

A search algorithm for efficient processing of hidden Markov models (HMMs) is presented. The algorithm is an alternative to the classical Viterbi algorithm for fast computation of shortest paths in HMMs. It is part of a larger decoding strategy aimed at reducing the overall computational complexity in ASR. In this approach, all HMM computations are completely decoupled from the rest of the decoding process. This enables the use of larger vocabularies and more complex language models without an increase of HMM-related computations.

Ace is another speech recognition system developed within this work. It is a platform aimed at facilitating the development of speech recognizers and new decoding methods.

A real-time system for low-latency online speech transcription is also presented. The system was developed within a project with the goal of improving the possibilities for hard-of-hearing people to use conventional telephony by providing speech-synchronized multimodal feedback. This work addresses several additional requirements implied by this special recognition task.

Place, publisher, year, edition, pages
Stockholm: KTH , 2003. , iii, 65 p.
Series
Trita-TMH, ISSN 1104-5787 ; 2003:14
Keyword [en]
speech recognition, algorithms, hidden markov models, HMM, weigted finite-state transducers
Identifiers
URN: urn:nbn:se:kth:diva-3675ISBN: 91-7283-657-1 (print)OAI: oai:DiVA.org:kth-3675DiVA: diva2:9511
Public defence
2003-12-17, 00:00
Note
QC 20100811Available from: 2003-12-11 Created: 2003-12-11 Last updated: 2010-08-12Bibliographically approved
List of papers
1. The KTH Large Vocabulary Continuous Speech Recognition System
Open this publication in new window or tab >>The KTH Large Vocabulary Continuous Speech Recognition System
2004 (English)Report (Other academic)
Place, publisher, year, edition, pages
Stockholm: KTH, 2004
Series
Trita-TMH, ISSN 1104-5787
National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-24047 (URN)
Note
QC 20100811Available from: 2010-08-11 Created: 2010-08-11 Last updated: 2011-02-04Bibliographically approved
2. A fast HMM match algorithm for very large vocabulary speech recognition
Open this publication in new window or tab >>A fast HMM match algorithm for very large vocabulary speech recognition
2004 (English)In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 42, no 2, 191-206 p.Article in journal (Refereed) Published
Abstract [en]

The search over context-dependent continuous density Hidden Markov Models (HMMs), including state-likelihood computations, accounts for a considerable part of the total decoding time for a speech recognizer. This is especially apparent in tasks that incorporate large vocabularies and long-dependency n-gram grammars, since these impose a high degree of context dependency and HMMs have to be treated differently in each context. This paper proposes a strategy for acoustic match of typical continuous density HMMs, decoupled from the main search and conducted as a separate component suited for parallelization. Instead of computing a large amount of probabilities for different alignments of each HMM, the proposed method computes all alignments, but more efficiently. Each HMM is matched only once against any time interval, and thus may be instantly looked up by the main search algorithm as required. In order to accomplish this in real time, a fast time-warping match algorithm is proposed, exploiting the specifics of the 3-state left-to-right HMM topology without skips. In proof-of-concept tests, using a highly optimized SIMD-parallel implementation, the algorithm was able to perform time-synchronous decoupled evaluation of a triphone acoustic model, with maximum phone duration of 40 frames, with a real-time factor of 0.83 on one of the CPUs of a Dual-Xeon 2 GHz workstation. The algorithm was able to compute the likelihood for 636,000 locally optimal HMM paths/second, with full state evaluation.

Keyword
HMM, acoustic match, parallel, large vocabulary speech recognition, search
National Category
Social Sciences Interdisciplinary
Identifiers
urn:nbn:se:kth:diva-23223 (URN)10.1016/j.specom.2003.08.005 (DOI)000189377800004 ()2-s2.0-1142300553 (Scopus ID)
Note
QC 20100525 QC 20111031Available from: 2010-08-10 Created: 2010-08-10 Last updated: 2017-12-12Bibliographically approved
3. Low-Latency Incremental Speech Transcription in the Synface Project
Open this publication in new window or tab >>Low-Latency Incremental Speech Transcription in the Synface Project
2003 (English)In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), Geneva, Switzerland, 2003: vol 2, 2003, 1141-1144 p.Conference paper, Published paper (Other academic)
Abstract [en]

In this paper, a real-time decoder for low-latency onlinespeech transcription is presented. The system was developedwithin the Synface project, which aims to improve thepossibilities for hard of hearing people to use conventionaltelephony by providing speech-synchronized multimodalfeedback. This paper addresses the specific issues related toHMM-based incremental phone classification with real-timeconstraints. The decoding algorithm described in this workenables a trade-off to be made between improved recognitionaccuracy and reduced latency. By accepting a longer latencyper output increment, more time can be ascribed tohypothesis look-ahead and by that improve classificationaccuracy. Experiments performed on the Swedish SpeechDatdatabase show that it is possible to generate the sameclassification as is produced by non-incremental decodingusing HTK, by adopting a latency of approx. 150 ms ormore.

National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-24046 (URN)
Note
QC 20100811Available from: 2010-08-11 Created: 2010-08-11 Last updated: 2010-08-12Bibliographically approved
4. Transducer Optimizations for Tight-Coupled Decoding
Open this publication in new window or tab >>Transducer Optimizations for Tight-Coupled Decoding
2001 (English)In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), Aalborg, Denmark, 2001: vol 3, 2001, 1607-1610 p.Conference paper, Published paper (Other academic)
Abstract [en]

In this paper we apply a framework of finiteastate transducers (FsT) to uniformly represent various information sources and dataastructures used in speech recognition. These source models include contextafree language models, phonology models, acoustic model information (Hidden Markov Models), and pronunciation dictionaries. We will describe how this unified representation can serve as a single input model for the recognizer. We will demonstrate how the application of various levels of optimizations can lead to a more compact representation of these transducers and evaluate the effects on recognition performance, in terms of accuracy and computational complexity.

National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-24048 (URN)
Note
QC 20100811Available from: 2010-08-11 Created: 2010-08-11 Last updated: 2010-08-12Bibliographically approved
5. A Tree-Trellis N-best Decoder for Stochastic Context-Free Grammars
Open this publication in new window or tab >>A Tree-Trellis N-best Decoder for Stochastic Context-Free Grammars
2000 (English)In: Proceedings of the International Conference on Spoken Language Processing, Beijing, China, 2000: vol 4, 2000, 282-285 p.Conference paper, Published paper (Other academic)
Abstract [en]

In this paper a decoder for continuous speech recognition using stochastic context-free grammars is described. It forms the backbone of the ACE recognizer, which is a modular system for real-time speech recognition. A new rationale for automata is introduced, as well as a new model for pruning the search space.

National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-24049 (URN)
Note
QC 20100811Available from: 2010-08-11 Created: 2010-08-11 Last updated: 2010-08-12Bibliographically approved

Open Access in DiVA

fulltext(1314 kB)4185 downloads
File information
File name FULLTEXT01.pdfFile size 1314 kBChecksum MD5
92b1b8947e400986feb8859aca11d8936b88e4f3cb096579c92a8d6e786dc45e49b3a565
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Seward, Alexander
By organisation
Speech, Music and Hearing

Search outside of DiVA

GoogleGoogle Scholar
Total: 4185 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 639 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf