kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation
Department of Electronic Systems, NTNU, Norway.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. Department of Electronic Systems, NTNU, Norway.ORCID iD: 0000-0002-3323-5311
Department of Electronic Systems, NTNU, Norway.
2023 (English)In: Interspeech 2023, International Speech Communication Association , 2023, p. 2158-2162Conference paper, Published paper (Refereed)
Abstract [en]

Automatic speech recognition (ASR) systems have become a vital part of our everyday lives through their many applications. However, as much as we have developed in this regard, our most common evaluation method for ASR systems still remains to be word error rate (WER). WER does not give information on the severity of errors, which strongly impacts practical performance. As such, we examine a semantic-based metric called Aligned Semantic Distance (ASD) against WER and demonstrate its advantage over WER in two facets. First, we conduct a survey asking participants to score reference text and ASR transcription pairs. We perform a correlation analysis and show that ASD is more correlated to the human evaluation scores compared to WER. We also explore the feasibility of predicting human perception using ASD. Second, we demonstrate that ASD is more effective than WER as an indicator of performance on downstream NLP tasks such as named entity recognition and sentiment classification.

Place, publisher, year, edition, pages
International Speech Communication Association , 2023. p. 2158-2162
Keywords [en]
ASR evaluation metric, semantic context, user perception
National Category
Computer Sciences Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-337837DOI: 10.21437/Interspeech.2023-1778ISI: 001186650302068Scopus ID: 2-s2.0-85171598286OAI: oai:DiVA.org:kth-337837DiVA, id: diva2:1803471
Conference
24th International Speech Communication Association, Interspeech 2023, August 20-24, 2023, Dublin, Ireland
Note

QC 20241015

Available from: 2023-10-09 Created: 2023-10-09 Last updated: 2025-02-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Salvi, Giampiero

Search in DiVA

By author/editor
Salvi, Giampiero
By organisation
Speech, Music and Hearing, TMH
Computer SciencesNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 97 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf