This study reconsiders two simple toy data examples proposed by MacKay (2001) to illustrate what he called “symmetry-breaking” and inappropriate “over-pruning” by the variational inference (VI) approximation in Bayesian learning of probabilistic mixture models.
The exact Bayesian solution is derived formally, including the effects of parameter values in the prior distribution of mixture weights. The exact solution is then compared to the results of VI approximation.
In both toy examples both the exact solution and the VI approxi- mation normally assigned each data cluster entirely to its own mixture component. In both methods the number of active mixture components is normally the same as the number of data clusters. In this sense, the VI approach causes no “over-pruning”. In one extreme example with two clusters with only 1 and 3 samples, and very small parameter values in the prior Dirichlet distribution of mixture weights, the exact Bayesian solution assigned all samples to the same component, i.e., with “over-pruning”, whereas the VI approximation still converged to a solution using both mixture components, i.e., with no “over-pruning”. Thus, if inappropriate over-pruning occurs, it is probably caused by inappropriate selection of prior model parameters, and not by the VI approach.
The VI approximation shows “symmetry-breaking” because it converges to one of the arbitrary and equivalent permutations of the indices of mixture components. The “symmetric” exact solution formally in- cludes all these permutations, but this is precisely what makes the exact Bayesian solution computationally impractical. Thus, in these toy examples, we must conclude that “symmetry-breaking” is not the same thing as “over-pruning”. The VI approximation shows “symmetry-breaking” but no “over-pruning”.