Change search
ReferencesLink to record
Permanent link

Direct link
Automatic speaker verification on site and by telephone: methods, applications and assessment
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
2006 (English)Doctoral thesis, monograph (Other scientific)
Abstract [en]

Speaker verification is the biometric task of authenticating a claimed identity by means of analyzing a spoken sample of the claimant's voice. The present thesis deals with various topics related to automatic speaker verification (ASV) in the context of its commercial applications, characterized by co-operative users, user-friendly interfaces, and requirements for small amounts of enrollment and test data.

A text-dependent system based on hidden Markov models (HMM) was developed and used to conduct experiments, including a comparison between visual and aural strategies for prompting claimants for randomized digit strings. It was found that aural prompts lead to more errors in spoken responses and that visually prompted utterances performed marginally better in ASV, given that enrollment data were visually prompted. High-resolution flooring techniques were proposed for variance estimation in the HMMs, but results showed no improvement over the standard method of using target-independent variances copied from a background model. These experiments were performed on Gandalf, a Swedish speaker verification telephone corpus with 86 client speakers.

A complete on-site application (PER), a physical access control system securing a gate in a reverberant stairway, was implemented based on a combination of the HMM and a Gaussian mixture model based system. Users were authenticated by saying their proper name and a visually prompted, random sequence of digits after having enrolled by speaking ten utterances of the same type. An evaluation was conducted with 54 out of 56 clients who succeeded to enroll. Semi-dedicated impostor attempts were also collected. An equal error rate (EER) of 2.4% was found for this system based on a single attempt per session and after retraining the system on PER-specific development data. On parallel telephone data collected using a telephone version of PER, 3.5% EER was found with landline and around 5% with mobile telephones. Impostor attempts in this case were same-handset attempts. Results also indicate that the distribution of false reject and false accept rates over target speakers are well described by beta distributions. A state-of-the-art commercial system was also tested on PER data with similar performance as the baseline research system.

Place, publisher, year, edition, pages
Stockholm: KTH , 2006. , xvii, 332 p.
Trita-CSC-A, ISSN 1653-5723 ; 2006:26
Keyword [en]
speaker recognition, speaker verification, speech technology, biometrics, access control, speech corpus, variance estimation
National Category
Language Technology (Computational Linguistics)
URN: urn:nbn:se:kth:diva-4242ISBN: 91-7178-531-0ISBN: 978-91-7178-531-2OAI: diva2:11420
Public defence
2006-12-19, F3, Lindstedtsvägen 26, Stockholm, 14:00
QC 20100910Available from: 2006-12-15 Created: 2006-12-15 Last updated: 2010-09-10Bibliographically approved

Open Access in DiVA

fulltext(11645 kB)1526 downloads
File information
File name FULLTEXT01.pdfFile size 11645 kBChecksum SHA-1
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Melin, Håkan
By organisation
Speech, Music and Hearing, TMH
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 1526 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 792 hits
ReferencesLink to record
Permanent link

Direct link