kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (4 of 4) Show all publications
Lindbäck, J. (2025). Computational Optimal Transport via Operator Splitting — Analysis, Implementation, and Applications. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Computational Optimal Transport via Operator Splitting — Analysis, Implementation, and Applications
2025 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

This thesis develops scalable and theoretically grounded algorithms for optimal transport (OT) using operator splitting. The central theme is to exploit the structural properties of OT and the constraints imposed by modern hardware to derive algorithms that mitigate key implementation bottlenecks in large-scale applications. The thesis is based on four main works. 

First, the Douglas–Rachford splitting method is applied to unregularized OT. Through careful splitting design, we obtain an algorithm that combines favorable global convergence rates while being well-suited for GPU computations. Moreover, the approach avoids hyperparameters that often cause numerical instability in other OT methods, and is particularly well-suited for low-precision arithmetic, now ubiquitous in machine learning applications. The theoretical guarantees are complemented by numerical benchmarks, demonstrating that the algorithm is faster than the state-of-the-art on a range of problem instances. 

Second, we extend this framework to handle a broad class of regularizers, including group lasso and quadratic regularization. The algorithm derived in the first work is readily adapted to this class, and we show that the analysis and the GPU implementation generalize to this setting. Additional tight local convergence guarantees are also added. Numerical experiments confirm the efficiency of the algorithm, achieving 10–100x faster computation times than the state-of-the-art for a wide range of problem instances. 

Third, we introduce an asynchronous three-operator splitting scheme for trans- portation problems with nonlinear but differentiable costs, with the Gromov–Wasserstein problem as a special case. This algorithm removes synchronization bottlenecks by reusing gradients—a feature particularly useful when gradient evaluations dominate overall complexity. Its analysis extends classical operator-splitting theory to asynchronous settings in a novel manner, and these results are complemented by guarantees for the non-convex case. Applying the resulting algorithm to several real-world problems demonstrates that the method is highly efficient while yielding accurate solutions. 

Finally, we develop additional local convergence guarantees that account for the sparsity structure of OT solutions. Under mild assumptions, the algorithm identifies the correct sparsity pattern in finitely many iterations, after which faster linear convergence dominates. Moreover, we relate the local rate to graph-theoretical properties of the solution. This analysis not only refines the theoretical understanding of splitting methods for OT but also suggests principled strategies for stepsize tuning. 

Taken together, these contributions advance the theoretical understanding of splitting methods for optimal transport, while improving their practical usability, enabling their application to a variety of OT problems at greater scales. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. xi, 147
Series
TRITA-EECS-AVL ; 2025:81
National Category
Computational Mathematics Algorithms Artificial Intelligence
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-370063 (URN)978-91-8106-395-0 (ISBN)
Public defence
2025-10-15, F3, Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20250918

Available from: 2025-09-19 Created: 2025-09-18 Last updated: 2025-10-09Bibliographically approved
Lindbäck, J., Wang, Z. & Johansson, M. (2023). Bringing regularized optimal transport to lightspeed: a splitting method adapted for GPUs. In: Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023: . Paper presented at 37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, United States of America, Dec 10 2023 - Dec 16 2023. Neural information processing systems foundation
Open this publication in new window or tab >>Bringing regularized optimal transport to lightspeed: a splitting method adapted for GPUs
2023 (English)In: Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023, Neural information processing systems foundation , 2023Conference paper, Published paper (Refereed)
Abstract [en]

We present an efficient algorithm for regularized optimal transport. In contrast to previous methods, we use the Douglas-Rachford splitting technique to develop an efficient solver that can handle a broad class of regularizers. The algorithm has strong global convergence guarantees, low per-iteration cost, and can exploit GPU parallelization, making it considerably faster than the state-of-the-art for many problems. We illustrate its competitiveness in several applications, including domain adaptation and learning of generative models.

Place, publisher, year, edition, pages
Neural information processing systems foundation, 2023
Series
Advances in Neural Information Processing Systems, ISSN 1049-5258 ; 36
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-346141 (URN)001220600000027 ()2-s2.0-85191165324 (Scopus ID)
Conference
37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, United States of America, Dec 10 2023 - Dec 16 2023
Note

QC 20240507

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2024-07-22Bibliographically approved
Lindbäck, J. (2023). Novel Algorithms for Optimal Transport via Splitting Methods. (Licentiate dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Novel Algorithms for Optimal Transport via Splitting Methods
2023 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

This thesis studies how the Douglas–Rachford splitting technique can be leveraged for scalable computational optimal transport (OT). By carefully splitting the problem, we derive an algorithm with several advantages. First, the algorithm enjoys global convergence rates comparable to the state-of-the-art while benefiting from accelerated local rates. In contrast to other methods, it does not depend on hyperparameters that can cause numerical instability. This feature is particularly advantageous when low-precision floating points are used or if the data is noisy. Moreover, the updates can efficiently be carried out on GPUs and, therefore, benefit from the high degree of parallelization achieved via GPU computations. Furthermore, we show that the algorithm can be extended to handle a broad family of regularizers and constraints while enjoying the same theoretical and numerical properties. These factors combined result in a fast algorithm that can be applied to large-scale OT problems and regularized versions thereof, which we illustrate in several numerical experiments.

In the first part of the main body of the thesis, we present how Douglas-Rachford splitting can be adapted for the unregularized OT problem to derive a fast algorithm. We present two global convergence guarantees for the resulting algorithm: a 1/k-ergodic rate and a linear rate. We also show that the stopping criteria of the algorithm can be computed on the fly with virtually no extra costs. Further, we specify how a GPU kernel can be efficiently implemented to carry out the operations needed for the algorithm. To show that the algorithm is fast, accurate, and robust, we run a series of numerical benchmarks that demonstrate the advantages of our algorithm. We then extend the algorithm to handle regularized OT using sparsity-promoting regularizers. The generalized algorithm will enjoy the same sublinear rate derived for the unregularized formulation. We also complement the global rate with local guarantees, establishing that, under non-degeneracy assumptions on the solution, the algorithm will identify the correct sparsity pattern of the solution in finitely many iterations. When the sparsity pattern is identified, a faster linear rate typically dominates. We also specify how to extend to the GPU implementation and the stopping criteria to handle regularized OT, and we subsequently specify how to backpropagate through the solver. We end this part of the thesis by presenting some numerical results, including performance on quadratically regularized OT and group Lasso regularized OT for domain adaptation, showing a substantial improvement compared to the state-of-the-art.

In the last part of the thesis, we provide a more detailed analysis of the local behavior of the algorithm when applied to unregularized OT and quadratically regularized OT. We subsequently outline how to extend this analysis to several other sparsity-promoting regularizers. In the former two cases, we show that the update that constitutes the algorithm converges to a linear operator in finitely many iterations. By analyzing the spectral properties of these linear operators, we gain insights into the local behavior of the algorithm, and specifically, these results suggest how to tune stepsizes to obtain better local rates.   

Abstract [sv]

Denna avhandling behandlar hur Douglas–Rachford-splittning kan tillämpas för skalbara beräkningar av optimal transport (OT). Genom en noggrann splittning av problemet härleder vi en algoritm med flera fördelar. För det första åtnjuter algoritmen en global konvergenshastighet som är jämförbara med populära OT-lösare, samtidigt som den drar nytta av accelererade lokalahastigheter. Till skillnad från andra metoder är den inte beroende av hyperparametrar som kan orsaka numerisk instabilitet. Den här egenskapen är särskilt fördelaktig när lågprecisionsaritmetik används eller när data innehåller mycket brus. Uppdateringarna som algoritmen baseras på kan effektivt utföras på GPU:er och dra nytta av dess parallellberäkningar. Vi visar också att algoritmen kan utökas för att hantera en rad regulariseriseringar och bivillkor samtidigt som den åtnjuter liknande teoretiska och numeriska egenskaper. Tillsammans resulterar dessa faktorer i en snabb algoritm som kan tillämpas på storskaliga OT-problem samt flera av dess regulariserade varianter, vilket vi visar i flera numeriska experiment.

I den första delen av avhandlingen presenterar vi hur Douglas-Rachford-splittning kan anpassas för det oregulariserade OT-problemet för att härleda en snabb algoritm. För den resulterande algoritmen presenterar vi två globala konvergensgarantier: en 1/k-ergodisk och en linjär konvergenshastighet. Vi presenterar också hur stoppkriterierna för algoritmen kan beräknas utan vidare extra kostnader. Dessutom specificerar vi hur en GPU-kärna kan implementeras för att effektivt utföra de operationer som algoritmen baseras på. För att visa att algoritmen är snabb, exakt och robust utför vi ett flertal numeriska experiment som påvisar flera fördelar över jämförbara algoritmer. Därefter utökar vi algoritmen för att hantera regulariserad OT med s.k. sparsity-promoting regularizers. Den generaliserade algoritmen åtnjuter samma sublinjära konvergenshastighet som härleddes för den oregulariserade fallet. Vi kompletterar också garantierna genom att tillhandahålla lokala garantier genom att fastställa att givet svaga antaganden om lösningen kommer algoritmen att identifiera den korrekta lösningens gleshetsstuktur inom ett begränsat antal iterationer. När glesheten identifierats dominerar typiskt sett en snabbare linjär konvergenshastighet. Vi specificerar också hur man utökar till GPU-kärnan och resultaten av stoppkriterierna för att hantera regulariserad OT, och vi specificerar sedan hur man differentiera genom lösaren. Vi avslutar den här delen av avhandlingen genom att presentera några numeriska resultat för kvadratiskt reglerad OT och group Lasso-reglerad OT för domänanpassning, vilket visar en betydande förbättring jämfört med de mest populära metoderna för dessa tillämpningar.

I den sista delen av avhandlingen ger vi en mer detaljerad analys av algoritmens lokala beteende när den tillämpas på oregulerisad OT och kvadratiskt reglerad OT. Vi föreslår även sätt att studera flera andra fallet. I de två första fallen visar vi att uppdateringen som utgör algoritmen konvergerar till en linjär operator inom ett begränsat antal iterationer. Genom att analysera de spektrala egenskaperna hos dessa linjära operatorer får vi ytterligare insikter i algoritmens lokala beteende, och specifikt indikerar dessa resultat hur man kan justera steglängden för att uppnå ännu bättre konvergenshastigheter.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. p. iii, 85
Series
TRITA-EECS-AVL ; 2023:85
Keywords
optimal transport, splitting methods
National Category
Computational Mathematics
Research subject
Applied and Computational Mathematics, Optimization and Systems Theory
Identifiers
urn:nbn:se:kth:diva-339965 (URN)978-91-8040-772-4 (ISBN)
Presentation
2023-12-05, Harry Nyquist, Malvinas väg 10, Stockholm, 15:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20231123

Available from: 2023-11-23 Created: 2023-11-23 Last updated: 2023-11-24Bibliographically approved
Mai, V. V., Lindbäck, J. & Vejdemo-Johansson, M. (2022). A FAST AND ACCURATE SPLITTING METHOD FOR OPTIMAL TRANSPORT: ANALYSIS AND IMPLEMENTATION. In: ICLR 2022: 10th International Conference on Learning Representations. Paper presented at 10th International Conference on Learning Representations, ICLR 2022, Virtual, Online, Apr 25 2022 - Apr 29 2022. International Conference on Learning Representations, ICLR
Open this publication in new window or tab >>A FAST AND ACCURATE SPLITTING METHOD FOR OPTIMAL TRANSPORT: ANALYSIS AND IMPLEMENTATION
2022 (English)In: ICLR 2022: 10th International Conference on Learning Representations, International Conference on Learning Representations, ICLR , 2022Conference paper, Published paper (Refereed)
Abstract [en]

We develop a fast and reliable method for solving large-scale optimal transport (OT) problems at an unprecedented combination of speed and accuracy. Built on the celebrated Douglas-Rachford splitting technique, our method tackles the original OT problem directly instead of solving an approximate regularized problem, as many state-of-the-art techniques do. This allows us to provide sparse transport plans and avoid numerical issues of methods that use entropic regularization. The algorithm has the same cost per iteration as the popular Sinkhorn method, and each iteration can be executed efficiently, in parallel. The proposed method enjoys an iteration complexity O(1/∊) compared to the best-known O(1/∊2) of the Sinkhorn method. In addition, we establish a linear convergence rate for our formulation of the OT problem. We detail an efficient GPU implementation of the proposed method that maintains a primal-dual stopping criterion at no extra cost. Substantial experiments demonstrate the effectiveness of our method, both in terms of computation times and robustness.

Place, publisher, year, edition, pages
International Conference on Learning Representations, ICLR, 2022
National Category
Computational Mathematics Control Engineering Computer Sciences
Identifiers
urn:nbn:se:kth:diva-333376 (URN)2-s2.0-85150345788 (Scopus ID)
Conference
10th International Conference on Learning Representations, ICLR 2022, Virtual, Online, Apr 25 2022 - Apr 29 2022
Note

QC 20230801

Available from: 2023-08-01 Created: 2023-08-01 Last updated: 2023-08-01Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-1752-5335

Search in DiVA

Show all publications