kth.sePublications KTH
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards Efficient and Robust Decentralized Learning
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0003-0191-5301
2026 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

The widening gap between GPU compute capability and inter-node networkbandwidth presents a fundamental challenge for distributed deep learning. Whiletraditional ”All-Reduce” methods require every GPU to sync globally — slowing downthe entire system to the speed of the slowest worker — decentralized training allowsGPUs to communicate only with a few neighbors. Despite its potential, decentralized training is rarely adopted in practice because its performance gains are hard to predict, its impact on model accuracy is poorly understood, and it is complex to implement.

This thesis investigates Decentralized Training as a robust and efficient alternative to global synchronization. By restricting communication to a sparse graph of neighbors, decentralized algorithms reduce bandwidth usage and alleviate the single point of straggler inherent in global collective communications. Despite these theoretical advantages, adoption has been hindered by three key challenges: ambiguity regarding efficiency gains, uncertainty about generalization performance, and implementation barriers.

To address the efficiency ambiguity, we propose a comprehensive runtime model that characterizes the capability of decentralized algorithms. We derive an analytical bound that characterizes hardware–model regimes under which decentralized training can outperform the All-Reduce method by a margin, validating this model on GPU clusters. This analysis highlights the relevance of decentralized schemes as the "outer loop" synchronization mechanism in bandwidth-constrained environments.

Second, we tackle the generalization uncertainty by analyzing the role of consensus error. We initially propose AccumAdam, an engineering stabilization mechanism designed to mitigate momentum drift caused by decentralization and stabilize convergence. We then pivot to a novel perspective with DSGD-AC (Adaptive Consensus), demonstrating that consensus error --- often viewed as harmful noise --- can act as an implicit regularization mechanism related to curvature. We show that by controlling rather than eliminating this error, decentralized training can favor smooth minima and improve generalization compared to centralized baselines.

Finally, to lower the implementation barrier, we present Decent-DP, a lightweight, modular software library that integrates seamlessly with existing PyTorch workflows. Decent-DP enables transparent experimentation with various topologies and the AWC communication-computation pattern. Collectively, this work bridges the gap between systems-level optimization and learning-theoretic robustness, establishing decentralized learning as a potential component for resilient distributed training systems.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2026. , p. xix, 83
Series
TRITA-EECS-AVL ; 2026:13
Keywords [en]
Decentralized Learning, Machine Learning, Distributed Optimization, Parallel Computing
National Category
Artificial Intelligence Algorithms Networked, Parallel and Distributed Computing
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-376439ISBN: 978-91-8106-520-6 (print)OAI: oai:DiVA.org:kth-376439DiVA, id: diva2:2035863
Presentation
2026-03-05, D3, Floor 3, Lindstedtsvägen 5, Stockholm, 15:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20260205

Available from: 2026-02-08 Created: 2026-02-05 Last updated: 2026-02-16Bibliographically approved

Open Access in DiVA

fulltext(3184 kB)65 downloads
File information
File name FULLTEXT01.pdfFile size 3184 kBChecksum SHA-512
e70cad7a032ddf08ac23b33b32173ec3cd6dc792ca5024d5c9651102db1f78e33db61eec22173443e4ce07350705ad1037ccfe9550cf62793e3c76e20745881b
Type fulltextMimetype application/pdf

Authority records

Wang, Zesen

Search in DiVA

By author/editor
Wang, Zesen
By organisation
Decision and Control Systems (Automatic Control)
Artificial IntelligenceAlgorithmsNetworked, Parallel and Distributed Computing

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 3304 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf