Effect of MPEG audio compression on vocoders used in statistical parametric speech synthesis
2014 (English)In: 2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), European Signal Processing Conference, EUSIPCO , 2014, 1237-1241 p.Conference paper (Refereed)
This paper investigates the effect of MPEG audio compression on HMM-based speech synthesis using two state-of-the-art vocoders. Speech signals are first encoded with various compression rates and analyzed using the GlottHMM and STRAIGHT vocoders. Objective evaluation results show that the parameters of both vocoders gradually degrade with increasing compression rates, but with a clear increase in degradation with bit-rates of 32 kbit/s or less. Experiments with HMM-based synthesis with the two vocoders show that the degradation in quality is already perceptible with bit-rates of 32 kbit/s and both vocoders show similar trend in degradation with respect to compression ratio. The most perceptible artefacts induced by the compression are spectral distortion and reduced bandwidth, while prosody is better preserved.
Place, publisher, year, edition, pages
European Signal Processing Conference, EUSIPCO , 2014. 1237-1241 p.
, European Signal Processing Conference, ISSN 2219-5491
GlottHMM, HMM, MP3, MPEG, Statistical parametric speech synthesis, STRAIGHT
IdentifiersURN: urn:nbn:se:kth:diva-157960ScopusID: 2-s2.0-84911897440ISBN: 978-099286261-9OAI: oai:DiVA.org:kth-157960DiVA: diva2:773596
22nd European Signal Processing Conference, EUSIPCO 2014, 1 September 2014 through 5 September 2014, Lisbon; Portugal
QC 201412192014-12-192014-12-182014-12-19Bibliographically approved