kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DeePMOS-B: Deep Posterior Mean-Opinion-Score using Beta Distribution
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Information Science and Engineering.
Codemill AB Stockholm, Sweden.
Google LLC Zurich, Switzerland.
Google LLC Mountain View, USA.
Show others and affiliations
2024 (English)In: 32nd European Signal Processing Conference, EUSIPCO 2024 - Proceedings, European Signal Processing Conference, EUSIPCO , 2024, p. 416-420Conference paper, Published paper (Refereed)
Abstract [en]

Mean opinion score (MOS) is a bounded speech quality measure, ranging between 1 and 5. We propose using a Beta distribution to model the posterior of the bounded MOS for a given speech clip. We use a deep neural network (DNN), trained using a maximum-likelihood principle, providing the parameters of the posterior Beta distribution. A self-teacher learning setup is used to achieve robustness against the inherent challenge of training on a noisy dataset. The dataset noise comes from the subjective nature of the MOS labels, and only a handful of quality score ratings are provided for each speech clip. To compare with existing state-of-the-art methods, we use the mean of Beta posterior as a point estimate of the MOS. The proposed method shows competitive performance vis-a-vis several existing DNN-based methods that provide MOS point estimates, and an ablation study shows the importance of various components of the proposed method.

Place, publisher, year, edition, pages
European Signal Processing Conference, EUSIPCO , 2024. p. 416-420
Keywords [en]
Bayesian estimation, deep neural network, maximum-likelihood, speech quality assessment
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:kth:diva-356662Scopus ID: 2-s2.0-85208437864OAI: oai:DiVA.org:kth-356662DiVA, id: diva2:1914832
Conference
32nd European Signal Processing Conference, EUSIPCO 2024, Lyon, France, August 26-30, 2024
Note

Part of ISBN 9789464593617

QC 20241121

Available from: 2024-11-20 Created: 2024-11-20 Last updated: 2024-11-21Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

ScopusFulltext

Authority records

Chatterjee, Saikat

Search in DiVA

By author/editor
Liang, XinyuChatterjee, Saikat
By organisation
Information Science and Engineering
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 54 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf