kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Framework for Phoneme-Level Pronunciation Assessment Using CTC
Department of Electronic Systems, NTNU, Norway.
Department of Electronic Systems, NTNU, Norway.
Department of Electronic Systems, NTNU, Norway.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. Department of Electronic Systems, NTNU, Norway.ORCID iD: 0000-0002-3323-5311
2024 (English)In: Interspeech 2024, International Speech Communication Association , 2024, p. 302-306Conference paper, Published paper (Refereed)
Abstract [en]

Traditional phoneme-level goodness of pronunciation (GOP) methods require phoneme to speech alignment. The drawback is that these methods, by their definitions, are prone to alignment errors and preclude the possibility of deletion and insertion errors in pronunciation. We produce experimental evidence that CTC-based methods can be used in traditional GOP estimation in spite of their “peaky” output behaviour and may be less prone to alignment errors than traditional methods. We also propose a new framework for GOP estimation based on CTC-trained model that is independent of speech-phoneme alignment. By accounting for deletion and insertions as well as substitution errors, we show that our framework outperform alignment-based method. Our experimental results are based on the CMU-kids dataset for child speech and on the Speechocean762 containing both child and adult speech speakers. Our best method achieves 29.02% relative improvement over the baseline GOP methods.

Place, publisher, year, edition, pages
International Speech Communication Association , 2024. p. 302-306
Keywords [en]
child speech, CTC, end-to-end, goodness of pronunciation, pronunciation assessment
National Category
Computer Sciences Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-358874DOI: 10.21437/Interspeech.2024-459ISI: 001331850100060Scopus ID: 2-s2.0-85214811904OAI: oai:DiVA.org:kth-358874DiVA, id: diva2:1930527
Conference
25th Interspeech Conferece 2024, Kos Island, Greece, September 1-5, 2024
Note

QC 20250127

Available from: 2025-01-23 Created: 2025-01-23 Last updated: 2025-12-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Salvi, Giampiero

Search in DiVA

By author/editor
Salvi, Giampiero
By organisation
Speech, Music and Hearing, TMH
Computer SciencesNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 159 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf