Open this publication in new window or tab >>Show others...
2024 (English)In: 32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 416-420Conference paper, Published paper (Refereed)
Abstract [en]
Mean opinion score (MOS) is a bounded speech quality measure, ranging between 1 and 5. We propose using a Beta distribution to model the posterior of the bounded MOS for a given speech clip. We use a deep neural network (DNN), trained using a maximum-likelihood principle, providing the parameters of the posterior Beta distribution. A self-teacher learning setup is used to achieve robustness against the inherent challenge of training on a noisy dataset. The dataset noise comes from the subjective nature of the MOS labels, and only a handful of quality score ratings are provided for each speech clip. To compare with existing state-of-the-art methods, we use the mean of Beta posterior as a point estimate of the MOS. The proposed method shows competitive performance vis-a-vis several existing DNN-based methods that provide MOS point estimates, and an ablation study shows the importance of various components of the proposed method.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
European Signal Processing Conference, ISSN 2076-1465
Keywords
speech quality assessment, deep neural network, maximum-likelihood, Bayesian estimation
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-358710 (URN)001349787000083 ()
Conference
32nd European Signal Processing Conference (EUSIPCO), AUG 26-30, 2024, Lyon, FRANCE
Note
Part of ISBN 978-9-4645-9361-7, 979-8-3315-1977-3
QC 20250121
2025-01-212025-01-212025-01-21Bibliographically approved