kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Correct and Efficient Monte Carlo Inference for Universal Probabilistic Programming Languages
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0003-3127-5640
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Probabilistic programming languages (PPLs) allow users to express statistical inference problems that the PPL implementation then, ideally, solves automatically. In particular, PPL users can focus on encoding their inference problems, and need not concern themselves with the intricacies of inference. Universal PPLs are PPLs with great expressive power, meaning that users can express essentially any inference problem. Consequently, universal PPL implementations often use general-purpose inference algorithms that are compatible with all such inference problems. A problem, however, is that general-purpose inference algorithms can often not efficiently solve complex inference problems. Furthermore, for certain inference algorithms, there are no formal correctness proofs in the context of universal PPLs.

This dissertation considers research problems related to Monte Carlo inference algorithms—sampling-based general-purpose inference algorithms that universal PPL implementations often apply. The first research problem concerns the correctness of sequential Monte Carlo (SMC) inference algorithms. A contribution in the dissertation is a proof of correctness for SMC algorithms in the context of universal PPLs. The second research problem concerns execution time improvements when suspending executions—a requirement in many Monte Carlo inference algorithms. The dissertation addresses the problem through two separate approaches. The first approach is a compilation technique targeting high-performance platforms. The second approach is a static suspension analysis guiding a selective continuation-passing style (CPS) transformation, reducing overhead compared to a full CPS transformation. The third research problem concerns inference improvements through alignment—a useful and often overlooked property in PPLs. The dissertation contributions are a formal definition of alignment, a static analysis technique that automatically aligns programs, and aligned versions of SMC and Markov chain Monte Carlo (MCMC) inference algorithms. The final research problem is more practical, and concerns the effective implementation of PPLs. Specifically, the contribution is the Miking CorePPL universal PPL and its compiler. Overall, the contributions in the dissertation significantly improve the efficiency of Monte Carlo algorithms as applied in universal PPLs.

Abstract [sv]

Probabilistiska programmeringsspråk (PPL:er) tillåter användare att uttrycka statistiska inferensproblem som PPL-implementationen sedan, i bästa fall, löser automatiskt. I synnerhet kan PPL-användare fokusera på att uttrycka sina inferensproblem utan att behöva bekymra sig om svårigheter tillhörande inferensen. Universella PPL:er är PPL:er med stor uttrycksfullhet, vilket innebär att användare kan uttrycka i princip vilket inferensproblem som helst. Följaktligen använder universella PPL-implementationer ofta inferensalgoritmer för allmänna ändamål som är kompatibla med alla sådana inferensproblem. Ett problem är dock att inferensalgoritmer för allmänna ändamål ofta inte effektivt kan lösa komplexa inferensproblem. Dessutom finns det inga formella korrekthetsbevis för vissa inferensalgoritmer när de används i universella PPL:er.

I denna avhandling behandlas forskningsproblem som rör Monte Carlo-inferensalgoritmer—samplingbaserade inferensalgoritmer för allmänna ändamål som universella PPL-implementationer ofta tillämpar. Det första forskningsproblemet rör korrektheten av sekventiella Monte Carlo-inferensalgoritmer (SMC). Ett bidrag i avhandlingen är ett korrekthetsbevis för SMC-algoritmer i universella PPL:er. Det andra forskningsproblemet rör förbättringar av exekveringstid vid exekveringsavbrott—ett krav i många Monte Carlo-inferensalgoritmer. Avhandlingen behandlar problemet genom två separata tillvägagångssätt. Det första tillvägagångssättet är en kompileringsteknik som riktar sig mot högpresterande plattformar. Det andra tillvägagångssättet är en statisk avbrottsanalys som styr en selektiv transformation till fortsättningsskickande stil (CPS), vilket reducerar exekveringstid jämfört med en fullständig CPS-transformation. Det tredje forskningsproblemet rör inferensförbättringar genom samordning—en användbar och ofta förbisedd egenskap i PPL:er. Avhandlingens bidrag är en formell definition av samordning, en statisk analysteknik som automatiskt samordnar program, samt samordnade versioner av SMC- och Markovkedjebaserade Monte Carlo-inferensalgoritmer (MCMC). Det sista forskningsproblemet är mer praktiskt och rör den effektiva implementationen av PPL:er. Konkret är bidraget den universella PPL:en Miking CorePPL och dess kompilator. Sammanfattningsvis förbättrar avhandlingens bidrag avsevärt effektiviteten hos Monte Carlo-algoritmer som tillämpas i universella PPL:er.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. , p. 272
Series
TRITA-EECS-AVL ; 2023:22
Keywords [en]
Probabilistic programming languages, Compilers, Static program analysis, Monte Carlo inference, Operational semantics
Keywords [sv]
Probabilistiska programmeringsspråk, Kompilatorer, Statisk programanalys, Monte Carlo-inferens, Operationell semantik
National Category
Computer Sciences
Research subject
Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-324496ISBN: 978-91-8040-503-4 (print)OAI: oai:DiVA.org:kth-324496DiVA, id: diva2:1741049
Public defence
2023-03-29, Zoom: https://kth-se.zoom.us/j/69904297956, Sal A, Kistagången 16, Kista, 13:00 (English)
Opponent
Supervisors
Note

QC 20230303

Available from: 2023-03-03 Created: 2023-03-02 Last updated: 2023-03-13Bibliographically approved
List of papers
1. Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages
Open this publication in new window or tab >>Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages
2021 (English)In: Programming Languages and Systems / [ed] Nobuko Yoshida, Cham, Switzerland: Springer Nature , 2021, Vol. 12648, p. 404-431Conference paper, Published paper (Refereed)
Abstract [en]

Probabilistic programming is an approach to reasoning under uncertainty by encoding inference problems as programs. In order to solve these inference problems, probabilistic programming languages (PPLs) employ different inference algorithms, such as sequential Monte Carlo (SMC), Markov chain Monte Carlo (MCMC), or variational methods. Existing research on such algorithms mainly concerns their implementation and efficiency, rather than the correctness of the algorithms themselves when applied in the context of expressive PPLs. To remedy this, we give a correctness proof for SMC methods in the context of an expressive PPL calculus, representative of popular PPLs such as WebPPL, Anglican, and Birch. Previous work have studied correctness of MCMC using an operational semantics, and correctness of SMC and MCMC in a denotational setting without term recursion. However, for SMC inference—one of the most commonly used algorithms in PPLs as of today—no formal correctness proof exists in an operational setting. In particular, an open question is if the resample locations in a probabilistic program affects the correctness of SMC. We solve this fundamental problem, and make four novel contributions: (i) we extend an untyped PPL lambda calculus and operational semantics to include explicit resample terms, expressing synchronization points in SMC inference; (ii) we prove, for the first time, that subject to mild restrictions, any placement of the explicit resample terms is valid for a generic form of SMC inference; (iii) as a result of (ii), our calculus benefits from classic results from the SMC literature: a law of large numbers and an unbiased estimate of the model evidence; and (iv) we formalize the bootstrap particle filter for the calculus and discuss how our results can be further extended to other SMC algorithms. 

Place, publisher, year, edition, pages
Cham, Switzerland: Springer Nature, 2021
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 12648
Keywords
Probabilistic Programming, Sequential Monte Carlo, Operational Semantics, Functional Programming, Measure Theory
National Category
Computer Sciences Probability Theory and Statistics Computer Sciences
Identifiers
urn:nbn:se:kth:diva-292106 (URN)10.1007/978-3-030-72019-3_15 (DOI)000787775800015 ()2-s2.0-85104964442 (Scopus ID)
Conference
30th European Symposium on Programming (ESOP) 27 March - 1 April, 2021 online
Funder
Swedish Foundation for Strategic Research, ASSEMBLE RIT15-0012Swedish Research Council, 2013-4853
Note

Part of ISBN 9783030720186

QC 20251021

Available from: 2021-03-24 Created: 2021-03-24 Last updated: 2025-10-21Bibliographically approved
2. Compiling Universal Probabilistic Programming Languages with Efficient Parallel Sequential Monte Carlo Inference
Open this publication in new window or tab >>Compiling Universal Probabilistic Programming Languages with Efficient Parallel Sequential Monte Carlo Inference
Show others...
2022 (English)In: Programming Languages and Systems: 31st European Symposium on Programming, ESOP 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2–7, 2022, Proceedings / [ed] Ilya Sergey, Cham: Springer, 2022, Vol. 13240, p. 29-56Conference paper, Published paper (Refereed)
Abstract [en]

Probabilistic programming languages (PPLs) allow users to encode arbitrary inference problems, and PPL implementations provide general-purpose automatic inference for these problems. However, constructing inference implementations that are efficient enough is challenging for many real-world problems. Often, this is due to PPLs not fully exploiting available parallelization and optimization opportunities. For example, handling probabilistic checkpoints in PPLs through continuation-passing style transformations or non-preemptive multitasking—as is done in many popular PPLs—often disallows compilation to low-level languages required for high-performance platforms such as GPUs. To solve the checkpoint problem, we introduce the concept of PPL control-flow graphs (PCFGs)—a simple and efficient approach to checkpoints in low-level languages. We use this approach to implement RootPPL: a low-level PPL built on CUDA and C++ with OpenMP, providing highly efficient and massively parallel SMC inference. We also introduce a general method of compiling universal high-level PPLs to PCFGs and illustrate its application when compiling Miking CorePPL—a high-level universal PPL—to RootPPL. The approach is the first to compile a universal PPL to GPUs with SMC inference. We evaluate RootPPL and the CorePPL compiler through a set of real-world experiments in the domains of phylogenetics and epidemiology, demonstrating up to 6x speedups over state-of-the-art PPLs implementing SMC inference.

Place, publisher, year, edition, pages
Cham: Springer, 2022
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Keywords
Probabilistic Programming Languages, Sequential Monte Carlo, GPU Compilation
National Category
Computer Sciences Probability Theory and Statistics
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-311254 (URN)10.1007/978-3-030-99336-8_2 (DOI)000783774400002 ()2-s2.0-85128679765 (Scopus ID)
Conference
Programming Languages and Systems - 31st European Symposium on Programming, ESOP 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2-7, 2022
Funder
Swedish Foundation for Strategic Research, FFL15-0032Swedish Foundation for Strategic Research, RIT15-0012EU, Horizon 2020, 898120Swedish Research Council, 2018-04620
Note

QC 20250922

Available from: 2022-04-20 Created: 2022-04-20 Last updated: 2025-09-22Bibliographically approved
3. Automatic Alignment in Higher-Order Probabilistic Programming Languages
Open this publication in new window or tab >>Automatic Alignment in Higher-Order Probabilistic Programming Languages
2023 (English)In: Programming Languages and Systems, Springer Nature , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Probabilistic Programming Languages (PPLs) allow users to encode statistical inference problems and automatically apply an inference algorithm to solve them. Popular inference algorithms for PPLs, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC), are built around checkpoints—relevant events for the inference algorithm during the execution of a probabilistic program. Deciding the location of checkpoints is, in current PPLs, not done optimally. To solve this problem, we present a static analysis technique that automatically determines checkpoints in programs, relieving PPL users of this task. The analysis identifies a set of checkpoints that execute in the same order in every program run—they are aligned. We formalize alignment, prove the correctness of the analysis, and implement the analysis as part of the higher-order functional PPL Miking CorePPL. By utilizing the alignment analysis, we design two novel inference algorithm variants: aligned SMC and aligned lightweight MCMC. We show, through real-world experiments, that they significantly improve inference execution time and accuracy compared to standard PPL versions of SMC and MCMC.

Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Computer Science ; 13990
Keywords
Probabilistic programming, Operational semantics, Static analysis
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-324293 (URN)10.1007/978-3-031-30044-8_20 (DOI)001284040300020 ()2-s2.0-85161447105 (Scopus ID)
Conference
32nd European Symposium on Programming, ESOP 2023, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2023, Paris, France, April 22–27, 2023
Note

Part of ISBN 9783031300431, 9783031300448

QC 20251002

Available from: 2023-02-24 Created: 2023-02-24 Last updated: 2025-10-02
4. Suspension Analysis and Selective Continuation-Passing Style for Higher-Order Probabilistic Programming Languages
Open this publication in new window or tab >>Suspension Analysis and Selective Continuation-Passing Style for Higher-Order Probabilistic Programming Languages
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Probabilistic programming languages (PPLs) make encoding and automatically solving statistical inference problems relatively easy by separating models from the inference algorithm. A popular choice for solving inference problems is to use Monte Carlo inference algorithms. For higher-order functional PPLs, these inference algorithms rely on execution suspension to perform inference, most often enabled through a full continuation-passing style (CPS) transformation. However, standard CPS transformations for PPL compilers introduce significant overhead, a problem the community has generally overlooked. State-of-the-art solutions either perform complete CPS transformations with performance penalties due to unnecessary closure allocations or use efficient, but complex, low-level solutions that are often not available in high-level languages. In contrast to prior work, we develop a new approach that is both efficient and easy to implement using higher-order languages. Specifically, we design a novel static suspension analysis technique that determines the parts of a program that require suspension, given a particular inference algorithm. The analysis result allows selectively CPS transforming the program only where necessary. We formally prove the correctness of the suspension analysis and implement both the suspension analysis and selective CPS transformation in the Miking CorePPL compiler. We evaluate the implementation for a large number of Monte Carlo inference algorithms on real-world models from phylogenetics, epidemiology, and topic modeling. The evaluation results demonstrate significant improvements across all models and inference algorithms.

National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-324295 (URN)
Note

QC 20230227

Available from: 2023-02-25 Created: 2023-02-25 Last updated: 2023-03-02Bibliographically approved

Open Access in DiVA

fulltext(1174 kB)453 downloads
File information
File name FULLTEXT01.pdfFile size 1174 kBChecksum SHA-512
3b5b29090be0901879e54eb5ee4d9ac3690cf8b32f75cf1871f1fbd987c8f544ead2b4a37fedf5f0864fb7583d30e55f1282d2cb4a3a3faf87e3b84e11bd2147
Type fulltextMimetype application/pdf
fulltext(2678 kB)534 downloads
File information
File name FULLTEXT02.pdfFile size 2678 kBChecksum SHA-512
3050992aeb8901e5f803cd6af83133f7127fd4dc3e4cb8d6ded54828c682e23cfe9de0013b47e665e865a68bff67d2e0219f0ccd59353722a2c4e316bcd6d3ab
Type fulltextMimetype application/pdf

Authority records

Lundén, Daniel

Search in DiVA

By author/editor
Lundén, Daniel
By organisation
Software and Computer systems, SCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 988 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2733 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf