kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Benefits of Non-Linear Scale Parameterizations in Black Box Variational Inference through Smoothness Results and Gradient Variance Bounds
KTH, Centres, Science for Life Laboratory, SciLifeLab. Klarna, Klarna.ORCID iD: 0009-0009-7253-5024
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-4552-0240
2024 (English)In: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics, AISTATS 2024, ML Research Press , 2024, Vol. 238, p. 3538-3546Conference paper, Published paper (Refereed)
Abstract [en]

Black box variational inference has consistently produced impressive empirical results. Convergence guarantees require that the variational objective exhibits specific structural properties and that the noise of the gradient estimator can be controlled. In this work we study the smoothness and the variance of the gradient estimator for location-scale variational families with non-linear covariance parameterizations. Specifically, we derive novel theoretical results for the popular exponential covariance parameterization and tighter gradient variance bounds for the softplus parameterization. These results reveal the benefits of using non-linear scale parameterizations on large scale datasets. With a non-linear scale parameterization, the smoothness constant of the variational objective and the upper bound on the gradient variance decrease as the scale parameter becomes smaller. Learning posterior approximations with small scales is essential in Bayesian statistics with sufficient amount of data, since under appropriate assumptions, the posterior distribution is known to contract around the parameter of interest as the sample size increases. We validate our theoretical findings through empirical analysis on several large-scale datasets, underscoring the importance of non-linear parameterizations.

Place, publisher, year, edition, pages
ML Research Press , 2024. Vol. 238, p. 3538-3546
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 238
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:kth:diva-347320ISI: 001286500303007Scopus ID: 2-s2.0-85194189884OAI: oai:DiVA.org:kth-347320DiVA, id: diva2:1867253
Conference
27th International Conference on Artificial Intelligence and Statistics, AISTATS 2024, Valencia, Spain, May 2 2024 - May 4 2024
Note

QC 20241213

Available from: 2024-06-10 Created: 2024-06-10 Last updated: 2024-12-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

ScopusFulltext

Authority records

Hotti, AlexandraLagergren, Jens

Search in DiVA

By author/editor
Hotti, AlexandraVan der Goten, LennartLagergren, Jens
By organisation
Science for Life Laboratory, SciLifeLabComputational Science and Technology (CST)
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 43 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf