Change search
ReferencesLink to record
Permanent link

Direct link
Vector Quantization of LSF Parameters With a Mixture of Dirichlet Distributions
Beijing University of Posts and Telecommunications, China.
KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
School of Engineering and Computer Science, Victoria University of Wellington, New Zealand.
2013 (English)In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, Vol. 21, no 9, 1777-1790 p.Article in journal (Refereed) Published
Abstract [en]

Quantization of the linear predictive coding parameters is an important part in speech coding. Probability density function (PDF)-optimized vector quantization (VQ) has been previously shown to be more efficient than VQ based only on training data. For data with bounded support, some well-defined bounded-support distributions (e.g., the Dirichlet distribution) have been proven to outperform the conventional Gaussian mixture model (GMM), with the same number of free parameters required to describe the model. When exploiting both the boundary and the order properties of the line spectral frequency (LSF) parameters, the distribution of LSF differences (Delta LSF) can be modelled with a Dirichlet mixture model (DMM). We propose a corresponding DMM based VQ. The elements in a Dirichlet vector variable are highly mutually correlated. Motivated by the Dirichlet vector variable's neutrality property, a practical non-linear transformation scheme for the Dirichlet vector variable can be obtained. Similar to the Karhunen-Loeve transform for Gaussian variables, this non-linear transformation decomposes the Dirichlet vector variable into a set of independent beta-distributed variables. Using high rate quantization theory and by the entropy constraint, the optimal inter-and intra-component bit allocation strategies are proposed. In the implementation of scalar quantizers, we use the constrained-resolution coding to approximate the derived constrained-entropy coding. A practical coding scheme for DVQ is designed for the purpose of reducing the quantization error accumulation. The theoretical and practical quantization performance of DVQ is evaluated. Compared to the state-of-the-art GMM-based VQ and recently proposed beta mixture model (BMM) based VQ, DVQ performs better, with even fewer free parameters and lower computational cost.

Place, publisher, year, edition, pages
2013. Vol. 21, no 9, 1777-1790 p.
Keyword [en]
Beta distribution, bounded support distribution, Dirichlet distribution, line spectral frequency, mixture modelling, neutrality property, Speech coding, vector quantization
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Computer and Information Science
URN: urn:nbn:se:kth:diva-47406DOI: 10.1109/TASL.2013.2238732ISI: 000321906500001ScopusID: 2-s2.0-84880522911OAI: diva2:455007

QC 20130815. Updated from submitted to published.

Available from: 2011-11-08 Created: 2011-11-08 Last updated: 2013-12-06Bibliographically approved
In thesis
1. Non-Gaussian Statistical Modelsand Their Applications
Open this publication in new window or tab >>Non-Gaussian Statistical Modelsand Their Applications
2011 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Statistical modeling plays an important role in various research areas. It provides away to connect the data with the statistics. Based on the statistical properties of theobserved data, an appropriate model can be chosen that leads to a promising practicalperformance. The Gaussian distribution is the most popular and dominant probabilitydistribution used in statistics, since it has an analytically tractable Probability DensityFunction (PDF) and analysis based on it can be derived in an explicit form. However,various data in real applications have bounded support or semi-bounded support. As the support of the Gaussian distribution is unbounded, such type of data is obviously notGaussian distributed. Thus we can apply some non-Gaussian distributions, e.g., the betadistribution, the Dirichlet distribution, to model the distribution of this type of data.The choice of a suitable distribution is favorable for modeling efficiency. Furthermore,the practical performance based on the statistical model can also be improved by a bettermodeling.

An essential part in statistical modeling is to estimate the values of the parametersin the distribution or to estimate the distribution of the parameters, if we consider themas random variables. Unlike the Gaussian distribution or the corresponding GaussianMixture Model (GMM), a non-Gaussian distribution or a mixture of non-Gaussian dis-tributions does not have an analytically tractable solution, in general. In this dissertation,we study several estimation methods for the non-Gaussian distributions. For the Maxi-mum Likelihood (ML) estimation, a numerical method is utilized to search for the optimalsolution in the estimation of Dirichlet Mixture Model (DMM). For the Bayesian analysis,we utilize some approximations to derive an analytically tractable solution to approxi-mate the distribution of the parameters. The Variational Inference (VI) framework basedmethod has been shown to be efficient for approximating the parameter distribution byseveral researchers. Under this framework, we adapt the conventional Factorized Approx-imation (FA) method to the Extended Factorized Approximation (EFA) method and useit to approximate the parameter distribution in the beta distribution. Also, the LocalVariational Inference (LVI) method is applied to approximate the predictive distributionof the beta distribution. Finally, by assigning a beta distribution to each element in thematrix, we proposed a variational Bayesian Nonnegative Matrix Factorization (NMF) forbounded support data.

The performances of the proposed non-Gaussian model based methods are evaluatedby several experiments. The beta distribution and the Dirichlet distribution are appliedto model the Line Spectral Frequency (LSF) representation of the Linear Prediction (LP)model for statistical model based speech coding. For some image processing applications,the beta distribution is also applied. The proposed beta distribution based variationalBayesian NMF is applied for image restoration and collaborative filtering. Comparedto some conventional statistical model based methods, the non-Gaussian model basedmethods show a promising improvement.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2011. xii, 49 p.
Trita-EE, ISSN 1653-5146
National Category
Telecommunications Computer and Information Science
urn:nbn:se:kth:diva-47408 (URN)978-91-7501-158-5 (ISBN)
Public defence
2011-12-05, E1, Lindstedsvägen 3, KTH, Stockholm, 09:00 (English)
QC 20111115Available from: 2011-11-15 Created: 2011-11-08 Last updated: 2011-11-15Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Ma, ZhanyuLeijon, ArneKleijn, W. Bastiaan
By organisation
Sound and Image Processing (Closed 130101)
In the same journal
IEEE Transactions on Audio, Speech, and Language Processing
Electrical Engineering, Electronic Engineering, Information EngineeringComputer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 186 hits
ReferencesLink to record
Permanent link

Direct link