kth.sePublications
121 of 2
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automated Optimizations for Inference in Probabilistic Programming Languages
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0001-9703-6912
2024 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Probabilistic programming languages (PPLs) provide users with an interface to write probabilistic models and leave the statistical inference to the compiler and runtime systems. Ideally, the purpose of PPLs is to provide a seamless inference interface that makes the model and the inference parts modular. Yet, the structure of the user-defined model affects inference accuracy and execution time. Users can utilize certain structures, such as conjugate prior relations or tree structures within a given model, to improve inference efficiency; however, manually utilizing these structures in a model is time-consuming and error-prone. Especially as the PPL's expressiveness increases, the models may become too complex to rewrite. This thesis tackles the problems related to the automation of optimizations, improving the execution time and the accuracy of the inference based on the structure of the model. Further, it aims to make these optimizations expressive enough to handle complex models written in universal PPLs. 

The scope of this thesis is to optimize Bayesian inference in universal PPLs. The three contributions presented propose automated optimization approaches to improving the inference accuracy and execution time while preserving the expressiveness of the optimized program. The first contribution considers dynamic delayed sampling, a runtime algorithm to reduce inference variance through analytical relationships between random variables. We develop a compile-time version of the delayed sampling algorithm to reduce execution overhead. The second contribution regards the design and implementation of dynamic delayed sampling in a statically typed universal PPL, Miking CorePPL. Integrating delayed sampling into a statically typed PPL requires introducing additional constructs to the language. The third contribution is related to the forward pass of belief propagation algorithm, an efficient inference algorithm on tree graphs. We propose an approach automating this forward pass of belief propagation in statically typed universal PPLs to optimize likelihood calculations by utilizing the tree structures in a model. We evaluate the proposed approaches on real-world problems, such as topic modeling and phylogenetic problems.

Abstract [sv]

Probabilistiska programmeringsspråk (PPS) ger användare ett gränssnitt för att skriva probabilistiska modeller och överlåta den statistiska inferensen till kompilatorn och runtime-systemen. Idealiskt sett är syftet med PPS att erbjuda ett sömlöst inferencesgränssnitt som gör modell- och inferensdelarna modulära. Dock påverkas inferensens noggrannhet och exekveringstid av strukturen hos den användardefinierade modellen. Användare kan utnyttja vissa strukturer, såsom konjugerade prior-relationer eller trädsstrukturer inom en given modell, för att förbättra inferensens effektivitet; men att manuellt använda dessa strukturer i en modell är tidskrävande och felbenäget. Speciellt när PPL uttrycksförmåga ökar, kan modellerna bli för komplexa för att skrivas om på detta sätt. Denna avhandling tar itu med problemen relaterade till automatisering av optimeringar, förbättring av exekveringstid och noggrannheten hos inferens baserat på modellens struktur. Vidare fokuseras hur dessa optimeringar kan göras tillräckligt uttrycksfulla för att hantera komplexa modeller skrivna i universella PPS.

Omfattningen av denna avhandling är att optimera Bayesiansk inferens i universella PPS. De tre bidragen som presenteras föreslår automatiserade optimeringsmetoder för att förbättra inferensens noggrannhet och exekveringstid samtidigt som uttrycksfullheten hos det optimerade programmet bevaras. Det första bidraget behandlar dynamisk fördröjd sampling, en runtime-algoritm för att minska inferensens varians genom att använda analytiska relationer mellan slumpmässiga variabler. Vi utvecklar en kompileringstid-version av den fördröjda samplingalgoritmen för att minska exekveringsomkostnad. Det andra bidraget gäller design och implementering av dynamisk fördröjd sampling i Miking CorePPL, som är ett statiskt typat universellt PPS. Att integrera fördröjd sampling i ett statiskt typat PPS kräver införande av ytterligare konstruktioner i språket. Det tredje bidraget är relaterat till framåtpasset av belief propagation-algoritmen, en effektiv inferensalgoritm på trädgrafer. Vi föreslår en metod för att automatisera detta framåtpass av belief propagation i statiskt typade universella PPS för att optimera sannolikhetsberäkningar genom att utnyttja trädsstrukturer i en modell. Vi utvärderar de föreslagna metoderna på verkliga problem, såsom ämnesmodellering och fylogenetiska problem.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. , p. vii, 29
Series
TRITA-EECS-AVL ; 2024:73
Keywords [en]
PPL, Bayesian, Inference, Efficiency
Keywords [sv]
PPS, Bayesian, Inferensen, Effektiv
National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-353310ISBN: 978-91-8106-061-4 (print)OAI: oai:DiVA.org:kth-353310DiVA, id: diva2:1898049
Presentation
2024-10-14, https://kth-se.zoom.us/j/66846620144, Room Ada, Electrum, Kistagangen 16, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20240918

Available from: 2024-09-18 Created: 2024-09-16 Last updated: 2024-09-23Bibliographically approved
List of papers
1. Statically and Dynamically Delayed Sampling for Typed Probabilistic Programming Languages
Open this publication in new window or tab >>Statically and Dynamically Delayed Sampling for Typed Probabilistic Programming Languages
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Probabilistic programming languages (PPLs) make it possible to separate the concerns between probabilistic models and Bayesian inference algorithms. However, to make such inference efficient is technically very challenging, both in terms of execution time performance and inference accuracy. One successful optimization approach is the previously published work on dynamically delayed sampling. This runtime method makes use of analytical relations between random variables to reduce inference variance; however, tracking these relations introduces runtime overhead. Furthermore, implementing the dynamic approach in a statically typed language introduces type problems because delaying the sampling of random variables changes their types. Our work advances the state-of-the-art in two aspects. Firstly, to reduce the runtime overhead, we develop a compile-time version of delayed sampling. By incorporating optimization procedures during compilation, we eliminate the need for runtime relation tracking and consequent overhead. However, the compile-time version may not always be effective due to the program's possible dynamic behavior, such as stochastic branches, or the complexity of handling recursion. Secondly, we introduce constructs to implement dynamically delayed sampling in a statically typed universal PPL. Dynamically delayed sampling in statically typed languages is a viable optimization for complex Bayesian models, whereas simple models ought to be statically optimized. We evaluate both statically and dynamically delayed sampling on real-world examples, such as latent Dirichlet allocation and phylogenetic models, and implement the methods in a statically typed PPL, Miking CorePPL.

National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-353280 (URN)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Accepted to the ACM SIGPLAN International Conference on Software Language Engineering 2024

QC 20240918

Available from: 2024-09-15 Created: 2024-09-15 Last updated: 2024-09-18Bibliographically approved
2. Dynamically Automated Pruning of Universal Probabilistic Programming Languages
Open this publication in new window or tab >>Dynamically Automated Pruning of Universal Probabilistic Programming Languages
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Probabilistic programming languages aim to separate the user-specified model from the inference allowing users to focus on model design and leave inference to the compiler and runtime systems. However, the structure of the model affects the inference efficiency. We can utilize specific structures in the model to improve inference efficiency, such as random variables forming trees. In this work, we propose pruning, an approach that automates the forward pass of belief propagation to improve the efficiency of likelihood calculations in statically typed universal probabilistic programming languages (PPLs) by utilizing the tree structures within a model. Specifically, users annotate variables for marginalization, making the likelihood calculation more efficient. We present our method implemented in the probabilistic programming language Miking CorePPL and demonstrate the performance of our approach through a series of case studies in phylogenetics.

National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-353281 (URN)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Submitted for publication

QC 20240918

Available from: 2024-09-15 Created: 2024-09-15 Last updated: 2024-09-18Bibliographically approved

Open Access in DiVA

fulltext(13471 kB)11 downloads
File information
File name FULLTEXT01.pdfFile size 13471 kBChecksum SHA-512
518d0c3186625acd85c850d14f61e8aa1f003d9ebd0d2a3bd2e656faccafad673437aef9fcd3b7437230e86e5beab7950fb152feb5a8ad3e38b1bc7145a5ea23
Type fulltextMimetype application/pdf

Authority records

Çaylak, Gizem

Search in DiVA

By author/editor
Çaylak, Gizem
By organisation
Software and Computer systems, SCS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 11 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 156 hits
121 of 2
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf