Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Tree-Based Estimation of Speaker Characteristics for Speech Recognition
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
2009 (English)In: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, 580-583 p.Conference paper, Published paper (Refereed)
Abstract [en]

Speaker adaptation by means of adjustment of speaker characteristic properties, such as vocal tract length, has the important advantage compared to conventional adaptation techniques that the adapted models are guaranteed to be realistic if the description of the properties are. One problem with this approach is that the search procedure to estimate them is computationally heavy. We address the problem by using a multi-dimensional, hierarchical tree of acoustic model sets. The leaf sets are created by transforming a conventionally trained model set using leaf-specific speaker profile vectors. The model sets of non-leaf nodes are formed by merging the models of their child nodes, using a computationally efficient algorithm. During recognition, a maximum likelihood criterion is followed to traverse the tree. Studies of one- (VTLN) and four-dimensional speaker profile vectors (VTLN, two spectral slope parameters and model variance scaling) exhibit a reduction of the computational load to a fraction compared to that of an exhaustive grid search. In recognition experiments on children's connected digits using adult and male models, the one-dimensional tree search performed as well as the exhaustive search. Further reduction was achieved with four dimensions. The best recognition results are 0.93% and 10.2% WER in TIDIGITS and PF-Star-Sw, respectively, using adult models.

Place, publisher, year, edition, pages
BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009. 580-583 p.
Keyword [en]
speech recognition, speaker adaptation, VTLN, tree-based model selection
National Category
Computer and Information Science General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:kth:diva-29877ISI: 000276842800144Scopus ID: 2-s2.0-70450177504OAI: oai:DiVA.org:kth-29877DiVA: diva2:399109
Conference
10th INTERSPEECH 2009 Conference, Brighton, ENGLAND, SEP 06-10, 2009
Note
QC 20110221Available from: 2011-02-21 Created: 2011-02-17 Last updated: 2011-02-21Bibliographically approved

Open Access in DiVA

No full text

Other links

ScopusISCA

Search in DiVA

By author/editor
Blomberg, MatsElenius, Daniel
By organisation
Speech Communication and Technology
Computer and Information ScienceGeneral Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf