kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DeePMOS-B: Deep Posterior Mean-Opinion-Score using Beta Distribution
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Information Science and Engineering.ORCID iD: 0000-0003-4406-536x
Codemill AB, Stockholm, Sweden.
Google LLC, Zurich, Switzerland.
Google LLC, Mountain View, CA USA.
Show others and affiliations
2024 (English)In: 32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 416-420Conference paper, Published paper (Refereed)
Abstract [en]

Mean opinion score (MOS) is a bounded speech quality measure, ranging between 1 and 5. We propose using a Beta distribution to model the posterior of the bounded MOS for a given speech clip. We use a deep neural network (DNN), trained using a maximum-likelihood principle, providing the parameters of the posterior Beta distribution. A self-teacher learning setup is used to achieve robustness against the inherent challenge of training on a noisy dataset. The dataset noise comes from the subjective nature of the MOS labels, and only a handful of quality score ratings are provided for each speech clip. To compare with existing state-of-the-art methods, we use the mean of Beta posterior as a point estimate of the MOS. The proposed method shows competitive performance vis-a-vis several existing DNN-based methods that provide MOS point estimates, and an ablation study shows the importance of various components of the proposed method.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 416-420
Series
European Signal Processing Conference, ISSN 2076-1465
Keywords [en]
speech quality assessment, deep neural network, maximum-likelihood, Bayesian estimation
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:kth:diva-358710DOI: 10.23919/EUSIPCO63174.2024.10715351ISI: 001349787000083OAI: oai:DiVA.org:kth-358710DiVA, id: diva2:1929716
Conference
32nd European Signal Processing Conference (EUSIPCO), AUG 26-30, 2024, Lyon, FRANCE
Note

Part of ISBN 9789464593617, 9798331519773

QC 20250923

Available from: 2025-01-21 Created: 2025-01-21 Last updated: 2025-09-23Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Liang, XinyueChatterjee, Saikat

Search in DiVA

By author/editor
Liang, XinyueChatterjee, Saikat
By organisation
Information Science and EngineeringDigital futures
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 58 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf