kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DeePMOS: Deep Posterior Mean-Opinion-Score of Speech
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Information Science and Engineering. Digital Futures.ORCID iD: 0000-0003-4406-536X
Codemill AB, Umeå, Sweden.
Google LLC, Stockholm, Sweden.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Information Science and Engineering. Digital Futures .ORCID iD: 0000-0003-2638-6047
2023 (English)In: Interspeech 2023, International Speech Communication Association , 2023, p. 526-530Conference paper, Published paper (Refereed)
Abstract [en]

We propose a deep neural network (DNN) based method that provides a posterior distribution of mean-opinion-score (MOS) for an input speech signal. The DNN outputs parameters of the posterior, mainly the posterior's mean and variance. The proposed method is referred to as deep posterior MOS (DeePMOS). The relevant training data is inherently limited in size (limited number of labeled samples) and noisy due to the subjective nature of human listeners. For robust training of DeePMOS, we use a combination of maximum-likelihood learning, stochastic gradient noise, and a student-teacher learning setup. Using the mean of the posterior as a point estimate, we evaluate standard performance measures of the proposed DeePMOS. The results show comparable performance with existing DNN-based methods that only provide point estimates of the MOS. Then we provide an ablation study showing the importance of various components in DeePMOS.

Place, publisher, year, edition, pages
International Speech Communication Association , 2023. p. 526-530
Keywords [en]
deep neural network, maximum-likelihood, Speech quality assessment, voice conversion challenge
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:kth:diva-337876DOI: 10.21437/Interspeech.2023-1436ISI: 001186650300107Scopus ID: 2-s2.0-85171537160OAI: oai:DiVA.org:kth-337876DiVA, id: diva2:1803861
Conference
24th International Speech Communication Association, Interspeech 2023, Dublin, Ireland, Aug 20 2023 - Aug 24 2023
Note

QC 20231010

Available from: 2023-10-10 Created: 2023-10-10 Last updated: 2024-07-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Liang, XinyueChatterjee, Saikat

Search in DiVA

By author/editor
Liang, XinyueChatterjee, Saikat
By organisation
Information Science and Engineering
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 236 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf