kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reliable and Efficient Distributed Machine Learning
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Information Science and Engineering.ORCID iD: 0000-0001-7579-822x
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

With the ever-increasing penetration and proliferation of various smart Internet of Things (IoT) applications, machine learning (ML) is envisioned to be a key technique for big-data-driven modelling and analysis. Since massive data generated from these IoT devices are commonly collected and stored in a distributed manner, ML at the networks, e.g., distributed machine learning (DML), has been a promising emerging paradigm, especially for large-scale model training. In this thesis, we explore the optimization and design of DML algorithms under different network conditions. Our main research with regards to DML can be sorted into the following four aspects/papers as detailed below.

In the first part of the thesis, we explore fully-decentralized ML by utilizing alternating direction method of multipliers (ADMM). Specifically, to address the two main critical challenges in DML systems, i.e., communication bottleneck and stragglers (nodes/devices with slow responses), an error-control-coding-based stochastic incremental ADMM (csI-ADMM) is proposed. Given an appropriate mini-batch size, it is proved that the proposed csI-ADMM method has a $O( 1/\sqrt{k})$) convergence rate and $O(1/{\mu ^2})$ communication cost, where $k$ denotes the number of iterations and $\mu$ is the target accuracy.  In addition, tradeoff between the convergence rate and the number of stragglers, as well as the relationship between mini-batch size and number of stragglers, are both theoretically and experimentally analyzed. 

In the second part of the thesis, we investigate the asynchronous approach for fully-decentralized federated learning (FL). Specifically, an asynchronous parallel incremental block-coordinate descent (API-BCD) algorithm is proposed, where multiple nodes/devices are active in an asynchronous fashion to accelerate the convergence speed. The solution convergence of API-BCD is theoretically proved and simulation results demonstrate its superior performance in terms of both running speed and communication costs compared with state-of-the-art algorithms.

The third part of the thesis is devoted to the study of jointly optimizing communication efficiency and wireless resources for FL over wireless networks. Accordingly, an overall optimization problem is formulated, which is divided into two sub-problems, i.e., the client scheduling problem and the resource allocation problem for tractability. More specifically, to reduce the communication costs, a communication-efficient client scheduling policy is proposed by limiting communication exchanges and reusing stale local models. To optimize resource allocation at each communication round of FL training, an optimal solution based on linear search method is derived. The proposed communication-efficient FL (CEFL) algorithm is evaluated both analytically and by simulation. The final part of the thesis is a case study of implementing FL in low Earth orbit (LEO) based satellite communication networks. We investigate four possible architectures of combining ML in LEO-based computing networks. The learning performance of the proposed strategies is evaluated by simulation and results validate that FL-based computing networks can significantly reduce communication overheads and latency.

Abstract [sv]

Med ökat genomslag och spridning av olika smarta Internet of Things (IoT) applikationer, förväntas maskininlärning (ML) bli en nyckelteknik för modellering ochanalys av stora data mängder. Eftersom data från dessa IoT enheter vanligtvis sparaslokalt har ML på nätverksnivå, t.ex. distribuerad maskininlärning (DML), blivit en lovande ny paradigm, särskilt för storskalig modellträning. I denna avhandling utforskarvi optimeringen och designen av DML algoritmer under olika förutsättningar i nätverken. Vår huvudsakliga forskning i hänseende till DML är fördelad i fyra papper,beskrivna enligt nedan.I första delen av denna avhandling tittar vi på fullt decentraliserad ML genom nyttjandet av älternating direction method of multipliers"(ADMM). Mer specifikt föreslårvi en "error-control-coding-baserad stochastic incremental ADMM"(csl-ADMM) föratt tackla de två mest kritiska utmaningarna i DML system, dvs. flaskhalsar i kommunikation och eftersläpare (noder/enheter med långsam respons). Givet en lämpligmini-batch storlek visar vi att den föreslagna csl-ADMM metoden konvergerar medO(1/√k) med en kommunikationskostnad på O(1/µ2), där k är antalet iterationer ochv är sökt noggrannhet. Vi ger även en teoretisk och experimentell analys av sambandetmellan konvergenshastighet och antalet eftersläpare samt sambandet mellan mini-batchstorlek och antalet eftersläpare.I avhandlingens andra del undersöker vi den asynkrona hanteringen av fullt decentraliserad kollaborativ inlärning (FL, eng. Federated Learning). Specifikt föreslår vi enalgoritm för äsynchronous parallel incremental block-coordinate descent"(API-BCD),där flera enheter/noder är asynkront aktiva för att öka konvergens hastigheten. Vi gerteoretiskt bevis för API-BCD lösningens konvergens samt visar simuleringar som demonstrerar dess överlägsna prestanda i termer av både hastighet och kommunikationskostnader jämfört med state-of-the-art algoritmer.Avhandlingens tredje del är en studie i att simultant optimera kommunikations effektivitet och hanteringen av trådlösa resurser för FL över trådlösa nätverk. Ett övergripande optimeringsproblem formuleras, som delas upp i två delproblem, schemaläggning av klienter och ett resursallokerings problem. För att reducera kommunikationskostnaderna, föreslås en effektiv kommunikations policy för schemaläggning av klienter som begränsar kommunikation och återanvändandet av lokala modeller som blivitmindre relevanta med tiden. För att optimera resurs allokeringen i varje kommunikations runda av FL träning, härleds en optimal lösning baserad på en linjär sök metod.Den föreslagna kommunikationseffektiva FL (CEFL, eng. Communication EfficientFL) algoritmen utvärderas både analytiskt och med simulering.Den sista delen av avhandlingen är en fallstudie där FL implementeras i satellitkommunikationsnätverk i låg omloppsbana (LEO, eng. Low Earh Orbit). Vi undersökerfyra möjliga arkitekturer för kombinering av ML i satellitburna datornätverk. Prestandan av de föreslagna strategierna utvärderas via simuleringar och resultaten visar attFL-baserade datornätverk kan anmärkningsvärt minska kommunikationsoverhead ochlatens.

Place, publisher, year, edition, pages
Kungliga Tekniska högskolan, 2022. , p. 70
Series
TRITA-EECS-AVL ; 2022:18
Keywords [en]
Distributed machine learning, federated learning, communication efficiency, decentralized optimization
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-310374ISBN: 978-91-8040-179-1 (print)OAI: oai:DiVA.org:kth-310374DiVA, id: diva2:1649035
Public defence
2022-04-28, F3, Lindstedsvägen 26, Stockholm, 13:30 (English)
Opponent
Supervisors
Note

QC 20220404

Available from: 2022-04-04 Created: 2022-04-01 Last updated: 2022-06-25Bibliographically approved
List of papers
1. Coded Stochastic ADMM for Decentralized Consensus Optimization With Edge Computing
Open this publication in new window or tab >>Coded Stochastic ADMM for Decentralized Consensus Optimization With Edge Computing
Show others...
2021 (English)In: IEEE Internet of Things Journal, ISSN 2327-4662, Vol. 8, no 7, p. 5360-5373Article in journal (Refereed) Published
Abstract [en]

Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones, and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to analyze information in a decentralized manner instead of aggregating data to a fusion center. To train large-scale machine learning models, edge/fog computing is often leveraged as an alternative to centralized learning. We consider the problem of learning model parameters in a multiagent system with data locally processed via distributed edge nodes. A class of minibatch stochastic alternating direction method of multipliers (ADMMs) algorithms is explored to develop the distributed learning model. To address two main critical challenges in distributed learning systems, i.e., communication bottleneck and straggler nodes (nodes with slow responses), error-control-coding-based stochastic incremental ADMM is investigated. Given an appropriate minibatch size, we show that the minibatch stochastic ADMM-based method converges in a rate of O(1/root k), where k denotes the number of iterations. Through numerical experiments, it is revealed that the proposed algorithm is communication efficient, rapidly responding, and robust in the presence of straggler nodes compared with state-of-the-art algorithms.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Alternating direction method of multipliers (ADMMs), coded edge computing, consensus optimization, decentralized learning
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-293401 (URN)10.1109/JIOT.2021.3058116 (DOI)000633436600025 ()2-s2.0-85101455750 (Scopus ID)
Note

QC 20210426

Available from: 2021-04-26 Created: 2021-04-26 Last updated: 2022-06-25Bibliographically approved
2. Asynchronous Parallel Incremental Block-Coordinate Descent for Decentralized Machine Learning
Open this publication in new window or tab >>Asynchronous Parallel Incremental Block-Coordinate Descent for Decentralized Machine Learning
(English)Manuscript (preprint) (Other academic)
Abstract [en]

National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-310372 (URN)
Note

QC 20220401

Available from: 2022-03-29 Created: 2022-03-29 Last updated: 2022-06-25Bibliographically approved
3. Federated Learning over Wireless IoT Networks with Optimized Communication and Resources
Open this publication in new window or tab >>Federated Learning over Wireless IoT Networks with Optimized Communication and Resources
Show others...
2022 (English)In: IEEE Internet of Things Journal, E-ISSN 2327-4662, Vol. 9, no 17, p. 16592-16605Article in journal (Refereed) Published
Abstract [en]

To leverage massive distributed data and computation resources, machine learning in the network edge is considered to be a promising technique especially for large-scale model training. Federated learning (FL), as a paradigm of collaborative learning techniques, has obtained increasing research attention with the benefits of communication efficiency and improved data privacy. Due to the lossy communication channels and limited communication resources (e.g., bandwidth and power), it is of interest to investigate fast responding and accurate FL schemes over wireless systems. Hence, we investigate the problem of jointly optimized communication efficiency and resources for FL over wireless Internet of things (IoT) networks. To reduce complexity, we divide the overall optimization problem into two sub-problems, i.e., the client scheduling problem and the resource allocation problem. To reduce the communication costs for FL in wireless IoT networks, a new client scheduling policy is proposed by reusing stale local model parameters. To maximize successful information exchange over networks, a Lagrange multiplier method is first leveraged by decoupling variables including power variables, bandwidth variables and transmission indicators. Then a linear-search based power and bandwidth allocation method is developed. Given appropriate hyper-parameters, we show that the proposed communication-efficient federated learning (CEFL) framework converges at a strong linear rate. Through extensive experiments, it is revealed that the proposed CEFL framework substantially boosts both the communication efficiency and learning performance of both training loss and test accuracy for FL over wireless IoT networks compared to a basic FL approach with uniform resource allocation.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-310370 (URN)10.1109/jiot.2022.3151193 (DOI)000846738200091 ()2-s2.0-85124846600 (Scopus ID)
Note

QC 20231108

Available from: 2022-03-29 Created: 2022-03-29 Last updated: 2023-11-08Bibliographically approved
4. Satellite-Based Computing Networks with Federated Learning
Open this publication in new window or tab >>Satellite-Based Computing Networks with Federated Learning
(English)In: Article in journal (Refereed) In press
Abstract [en]

National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-310371 (URN)
Note

QC 20220404

Available from: 2022-03-29 Created: 2022-03-29 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

fulltext(2340 kB)899 downloads
File information
File name FULLTEXT01.pdfFile size 2340 kBChecksum SHA-512
7b2e53275382d424e82d14a0be1d8ce0c843a8386b416adba7973506d52c1796b276cf49ef67b0dd106413ca9b32bb530dfad6602facf382c32ef13db1b8dd45
Type fulltextMimetype application/pdf

Authority records

Chen, Hao

Search in DiVA

By author/editor
Chen, Hao
By organisation
Information Science and Engineering
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 903 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1691 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf