kth.sePublikationer KTH
Driftmeddelande
För närvarande är det driftstörningar. Felsökning pågår.
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation
Department of Electronic Systems, NTNU, Norway.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH. Department of Electronic Systems, NTNU, Norway.ORCID-id: 0000-0002-3323-5311
Department of Electronic Systems, NTNU, Norway.
2023 (Engelska)Ingår i: Interspeech 2023, International Speech Communication Association , 2023, s. 2158-2162Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Automatic speech recognition (ASR) systems have become a vital part of our everyday lives through their many applications. However, as much as we have developed in this regard, our most common evaluation method for ASR systems still remains to be word error rate (WER). WER does not give information on the severity of errors, which strongly impacts practical performance. As such, we examine a semantic-based metric called Aligned Semantic Distance (ASD) against WER and demonstrate its advantage over WER in two facets. First, we conduct a survey asking participants to score reference text and ASR transcription pairs. We perform a correlation analysis and show that ASD is more correlated to the human evaluation scores compared to WER. We also explore the feasibility of predicting human perception using ASD. Second, we demonstrate that ASD is more effective than WER as an indicator of performance on downstream NLP tasks such as named entity recognition and sentiment classification.

Ort, förlag, år, upplaga, sidor
International Speech Communication Association , 2023. s. 2158-2162
Nyckelord [en]
ASR evaluation metric, semantic context, user perception
Nationell ämneskategori
Datavetenskap (datalogi) Språkbehandling och datorlingvistik
Identifikatorer
URN: urn:nbn:se:kth:diva-337837DOI: 10.21437/Interspeech.2023-1778ISI: 001186650302068Scopus ID: 2-s2.0-85171598286OAI: oai:DiVA.org:kth-337837DiVA, id: diva2:1803471
Konferens
24th International Speech Communication Association, Interspeech 2023, August 20-24, 2023, Dublin, Ireland
Anmärkning

QC 20241015

Tillgänglig från: 2023-10-09 Skapad: 2023-10-09 Senast uppdaterad: 2025-02-01Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Salvi, Giampiero

Sök vidare i DiVA

Av författaren/redaktören
Salvi, Giampiero
Av organisationen
Tal, musik och hörsel, TMH
Datavetenskap (datalogi)Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 151 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf