DeePMOS-B: Deep Posterior Mean-Opinion-Score using Beta DistributionShow others and affiliations
2024 (English)In: 32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 416-420Conference paper, Published paper (Refereed)
Abstract [en]
Mean opinion score (MOS) is a bounded speech quality measure, ranging between 1 and 5. We propose using a Beta distribution to model the posterior of the bounded MOS for a given speech clip. We use a deep neural network (DNN), trained using a maximum-likelihood principle, providing the parameters of the posterior Beta distribution. A self-teacher learning setup is used to achieve robustness against the inherent challenge of training on a noisy dataset. The dataset noise comes from the subjective nature of the MOS labels, and only a handful of quality score ratings are provided for each speech clip. To compare with existing state-of-the-art methods, we use the mean of Beta posterior as a point estimate of the MOS. The proposed method shows competitive performance vis-a-vis several existing DNN-based methods that provide MOS point estimates, and an ablation study shows the importance of various components of the proposed method.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 416-420
Series
European Signal Processing Conference, ISSN 2076-1465
Keywords [en]
speech quality assessment, deep neural network, maximum-likelihood, Bayesian estimation
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:kth:diva-358710DOI: 10.23919/EUSIPCO63174.2024.10715351ISI: 001349787000083OAI: oai:DiVA.org:kth-358710DiVA, id: diva2:1929716
Conference
32nd European Signal Processing Conference (EUSIPCO), AUG 26-30, 2024, Lyon, FRANCE
Note
Part of ISBN 9789464593617, 9798331519773
QC 20250923
2025-01-212025-01-212025-09-23Bibliographically approved