kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Benefits of Non-Linear Scale Parameterizations in Black Box Variational Inference through Smoothness Results and Gradient Variance Bounds
KTH, Centres, Science for Life Laboratory, SciLifeLab. Klarna, Klarna.ORCID iD: 0009-0009-7253-5024
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-4552-0240
2024 (English)In: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics, AISTATS 2024, ML Research Press , 2024, Vol. 238, p. 3538-3546Conference paper, Published paper (Refereed)
Abstract [en]

Black box variational inference has consistently produced impressive empirical results. Convergence guarantees require that the variational objective exhibits specific structural properties and that the noise of the gradient estimator can be controlled. In this work we study the smoothness and the variance of the gradient estimator for location-scale variational families with non-linear covariance parameterizations. Specifically, we derive novel theoretical results for the popular exponential covariance parameterization and tighter gradient variance bounds for the softplus parameterization. These results reveal the benefits of using non-linear scale parameterizations on large scale datasets. With a non-linear scale parameterization, the smoothness constant of the variational objective and the upper bound on the gradient variance decrease as the scale parameter becomes smaller. Learning posterior approximations with small scales is essential in Bayesian statistics with sufficient amount of data, since under appropriate assumptions, the posterior distribution is known to contract around the parameter of interest as the sample size increases. We validate our theoretical findings through empirical analysis on several large-scale datasets, underscoring the importance of non-linear parameterizations.

Place, publisher, year, edition, pages
ML Research Press , 2024. Vol. 238, p. 3538-3546
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 238
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:kth:diva-347320ISI: 001286500303007Scopus ID: 2-s2.0-85194189884OAI: oai:DiVA.org:kth-347320DiVA, id: diva2:1867253
Conference
27th International Conference on Artificial Intelligence and Statistics, AISTATS 2024, Valencia, Spain, May 2-4, 2024
Note

QC 20260414

Available from: 2024-06-10 Created: 2024-06-10 Last updated: 2026-04-14Bibliographically approved
In thesis
1. Black-Box Variational Inference: Mixture Models, Efficient Learning, and Applications
Open this publication in new window or tab >>Black-Box Variational Inference: Mixture Models, Efficient Learning, and Applications
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

We advance Black-Box Variational Inference (BBVI) by improving its flexibility, scalability, and applicability to real-world challenges. In Paper I, we demonstrate that integrating mixture-based variational distributions into VAEs—leveraging adaptive importance sampling—enhances posterior expressiveness and mitigates mode collapse in applications such as image and single- cell analysis. Paper II introduces MISVAE, along with two novel ELBO estimators—Some-to-All and Some-to-Some—which enable efficient training with hundreds of mixture components and achieve state-of-the-art performance on the MNIST and Fashion-MNIST datasets. Paper III shifts focus to real-world applications by presenting the Klarna Product Page Dataset, a diverse benchmark for web element nomination, where we achieve strong performance by benchmarking GNNs in combination with GPT-4. Additionally, the dataset has been leveraged in generative modeling tasks, facilitating the learning of latent web page representations and the generation of complex web interfaces using VAEs. Finally, Paper IV provides new smoothness results and gradient variance bounds for BBVI under non-linear scale parameterizations, highlighting advantages in large-data regimes. Collectively, these contributions extend the frontiers of BBVI for tackling high-dimensional, structured data in both theory and practice.

Abstract [sv]

Vi bidrar till Black-Box Variational Inference (BBVI) genom att förbättra dess flexibilitet, skalbarhet och tillämpbarhet för praktiska tillämpningar. I Paper I visar vi att integrationen av mixture-baserade variational-fördelningar i VAEs – med hjälp av adaptiv importance sampling – förbättrar posteriorfördelningens uttrycksfullhet och motverkar modekollaps i tillämpningar såsom bild- och single-cell-analys. Paper II introducerar MISVAE tillsammans med nya ELBO-estimatorer (Some-to-All och Some-to-Some), vilka möjliggör effektiv träning med hundratals mixture-komponenter och ger resultat i framkant på MNIST och Fashion-MNIST. Paper III fokuserar på praktiska tillämpningar genom att presentera Klarna Product Page Datasetet, ett mångsidigt benchmark för nominering av webbelement, där vi uppnår starka resultat genom att benchmarka GNN:er i kombination med GPT-4. Dessutom har datasetet använts i generativa modelleringsuppgifter, vilket underlättar inlärningen av latenta representationer av webbsidor samt genereringen av komplexa webbgränssnitt med hjälp av VAEs. I Paper IV utforskar vi de teoretiska grunderna för BBVI med icke-linjära skalparametriseringar, såsom exponentiella och softplus-transformationer. Vi härleder nya strukturella resultat och gradientvariansgränser och visar att icke-linjära parametriseringar förbättras med stora dataset. Genom dessa bidrag skapar avhandlingen en länk mellan teoretiska framsteg och praktiska tillämpningar. Således lyfter avhandlingen fram hur flexibla och effektiva probabilistiska inferensmetoder kan hantera högdimensionella och strukturerade dataproblem inom både forskning och industri.  

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. vii, 68
Series
TRITA-EECS-AVL ; 2025:50
Keywords
Variational Inference, Mixture Models, Variational Autoencoders, Black-Box Variational Inference, Bayesian Inference, Web Automation, Graph Neural Networks, Large Language Models, Adaptive Importance Sampling, ELBO, Gradient Variance Bounds
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363692 (URN)978-91-8106-278-6 (ISBN)
Public defence
2025-06-02, F3 (Flodis), Lindstedtsvägen 26 & 28, KTH Campus, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20250521

Available from: 2025-05-21 Created: 2025-05-20 Last updated: 2025-06-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

ScopusFulltext

Authority records

Hotti, AlexandraVan der Goten, Lennart AlexanderLagergren, Jens

Search in DiVA

By author/editor
Hotti, AlexandraVan der Goten, Lennart AlexanderLagergren, Jens
By organisation
Science for Life Laboratory, SciLifeLabComputational Science and Technology (CST)
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 106 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf