kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Understanding Automatic Speech Recognition for L2 Speakers and Unintended Discrimination in Artificial Intelligence
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2022 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Automatisk Taligenkänning för L2-talare och Oavsiktlig Diskriminering inom Artificiell Intelligens (Swedish)
Abstract [en]

The thesis aimed to investigate the effects of unintended bias in artificial intelligence has on society and if it was possible to improve the performance of Auto-Speech- Recognition models by training them on non-native Swedish speakers. Two Automatic Speech Recognition systems, Microsoft Azure and Google cloud speech-to-text, were used in the process. Re-trained models were created in order to improve their recognition ability. The models were later evaluated by comparing the word error rates for the re-trained models and the pre-trained models. The study found that re-training the model on non-native speakers improved the performance of the Auto-Speech-Recognition models. This study can be of interest for researches concerning data set bias and how it affects the artificial intelligence models performance. It also helps the reader to understand how auto-speech-recognition models and their basic structure works.

Abstract [sv]

Studiens syfte var att undersöka vilka effekter bias inom artificiell intelligens har på samhället och om det är möjligt att förbättra prestandan av taligenkännings modeller genom att träna dem på talare med svenska som andraspråk. Två taligenkännings system, Microsoft Azure och Google cloud speech-to-text, användes i utförandet av studien. Egna modeller skapades för att förbättra igenkänningen. Modellerna evaluerades sedan genom att jämföra word error rates för de omtränade modellerna och de förtränade modellerna. Modellerna som omtränades på personer med svenska som andraspråk visade en förbättrad igenkänning än de förtränade modellerna. Den här studien kan vara av intresse för studier som undersöker dataset bias och hur det påverkar prestandan av de artificiella intelligens modellerna. Studien kan även hjälpa läsaren att förstå hur taligenkännings modeller och deras struktur fungerar.

Place, publisher, year, edition, pages
2022. , p. 10
Series
TRITA-EECS-EX ; 2022:348
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-319360OAI: oai:DiVA.org:kth-319360DiVA, id: diva2:1699775
Supervisors
Examiners
Available from: 2022-10-03 Created: 2022-09-28 Last updated: 2022-10-03Bibliographically approved

Open Access in DiVA

fulltext(757 kB)310 downloads
File information
File name FULLTEXT01.pdfFile size 757 kBChecksum SHA-512
60eef3619c8de705115024a08236448255853766b8315cdf4902afac0cb34df65827a6bbcae38a1fae1581c9d2552e19895a9d557d378b3a76e69bdd49c5e53f
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 313 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 734 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf