kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 12) Show all publications
Khirirat, S., Wang, X., Magnusson, S. & Johansson, M. (2023). Improved Step-Size Schedules for Proximal Noisy Gradient Methods. IEEE Transactions on Signal Processing, 71, 189-201
Open this publication in new window or tab >>Improved Step-Size Schedules for Proximal Noisy Gradient Methods
2023 (English)In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 71, p. 189-201Article in journal (Refereed) Published
Abstract [en]

Noisy gradient algorithms have emerged as one of the most popular algorithms for distributed optimization with massive data. Choosing proper step-size schedules is an important task to tune in the algorithms for good performance. For the algorithms to attain fast convergence and high accuracy, it is intuitive to use large step-sizes in the initial iterations when the gradient noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been confirmed in theory and practice for stochastic gradient descent. However, similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can indeed be applied for a broad class of noisy gradient algorithms. Our analysis framework is based on two classes of systems that characterize the impact of the step-sizes on the convergence performance of many algorithms. Our results show that such step-size schedules enable these algorithms to enjoy the optimal rate. We exemplify our results on stochastic compression algorithms. Our experiments validate fast convergence of these algorithms with the step decay schedules.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Gradient methods, compression algorithms, convex functions, machine learning, neural networks
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-326076 (URN)10.1109/TSP.2023.3237392 (DOI)000940083800001 ()2-s2.0-85148442912 (Scopus ID)
Note

QC 20230425

Available from: 2023-04-25 Created: 2023-04-25 Last updated: 2023-04-25Bibliographically approved
Khirirat, S., Magnusson, S. & Johansson, M. (2022). Eco-Fedsplit: Federated learning with error-compensated compression. In: 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP): . Paper presented at 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 22-27, 2022, Singapore, SINGAPORE (pp. 5952-5956). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Eco-Fedsplit: Federated learning with error-compensated compression
2022 (English)In: 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 5952-5956Conference paper, Published paper (Refereed)
Abstract [en]

Federated learning is an emerging framework for collaborative machine-learning on devices which do not want to share local data. State-of-the art methods in federated learning reduce the communication frequency, but are not guaranteed to converge to the optimal model parameters. These methods also experience a communication bottleneck, especially when the devices are power-constrained and communicate over a shared medium. This paper presents ECO-FedSplit, an algorithm that increases the communication efficiency of federated learning without sacrificing solution accuracy. The key is to compress inter-device communication and to compensate for information losses in a theoretically justified manner. We prove strong convergence properties of ECO-FedSplit on strongly convex optimization problems and show that the algorithm yields a highly accurate solution with dramatically reduced communication. Extensive numerical experiments validate our theoretical result on real data sets.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
International Conference on Acoustics Speech and Signal Processing ICASSP, ISSN 1520-6149
Keywords
Optimization methods, operator splitting schemes, quantization, federated learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-323012 (URN)10.1109/ICASSP43922.2022.9747809 (DOI)000864187906048 ()2-s2.0-85134043185 (Scopus ID)
Conference
47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 22-27, 2022, Singapore, SINGAPORE
Note

Part of proceedings: ISBN 978-1-6654-0540-9

QC 20230112

Available from: 2023-01-12 Created: 2023-01-12 Last updated: 2023-01-12Bibliographically approved
Khirirat, S. (2022). First-Order Algorithms for Communication Efficient Distributed Learning. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>First-Order Algorithms for Communication Efficient Distributed Learning
2022 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Innovations in numerical optimization, statistics and high performance computing have enabled tremendous advances in machine learning algorithms, fuelling applications from natural language processing to autonomous driving.To deal with increasing data volumes, and to keep the training times of increasingly complex machine learning models reasonable, modern optimization algorithms distribute both data and computations over a large number of machines. However, these algorithms face challenges, from hardware and data heterogeneity (as different machines have different processors and data) to privacy and security issues (where data can be extracted from the transmitted parameters).Among these challenges, communication overhead constitutes a major performance bottleneck of the algorithms.Communicating millions of problem parameters between machines has been reported to consumeup to 80% of the training time. To alleviate the communication bottleneck, we develop theory and strategies in this thesis to design communication-efficient optimization algorithms. 

In the first part, we provide unified analysis frameworks for first-order algorithms using direct or error-compensated compression. We first identify key characteristics of the compression errors induced by many of the most popular compression strategies in the literature. We then perform a unified analysis of the convergence properties of first-order algorithms for general families of deterministic and stochastic gradient compression algorithms.Our results give explicit expressions for how compression accuracy and the amount of asynchrony affect the step-sizes and guaranteed convergence times. We next turn our attention to error-compensated compression algorithms.We develop theoretical explanations for why error-compensated compression algorithms attain solutions with arbitrarily higher accuracy than direct compression algorithms.Our results provide strong convergence guarantees of error-compensated compression algorithms for  distributed and federated learning problems. 

In the second part, we provide flexible tuning frameworks to optimize convergence performance of compression algorithms for a  variety of system architectures. We start by analyzing data-dependent complexity that explains why direct compression algorithms are more communication-efficient than full-precision algorithms in practice. This complexity leads to automatic tuning strategies that enable popular compression algorithms on different communication networks to maximize both the convergence progress towards the solution and the communication efficiency. We then turn our attention to diminishing step-size schedules to maximize the convergence speed of  the algorithms using noisy gradients.Our analysis framework is based on two classes of systems that characterize the impact of the step-sizes on the speed of noisy gradient algorithms.Our results show that such step-size schedules enable these algorithms to enjoy the optimal rate. 

Applications of the algorithms in the thesis to central machine learning problems on benchmark data validate our theoretical results.

Abstract [sv]

Enorma framsteg inom maskininlärningsalgoritmer förbättrar centrala tillämpningar av artificiell intelligens, från naturlig språkbehandling till autonom körning, tack vare innovation inom numerisk optimering och högpresterande datorsystem. Dessa optimeringsbaserade algoritmer använder miljarder maskiner för att samordnat lösa storskaliga problem inom önskvärd träningstid. Emellertid utgör de utmaningar, från hårdvaru- och dataheterogenitet (eftersom olika enheter har olika datorkraft och data) till integritets- och säkerhetsproblematik (där data kan extraheras från de överförda parametrarna). Bland dessa utmaningar utgör kommunikationsoverhead en stor del av prestandaflaskhalsen för algoritmerna. Att kommunicera miljoner problemparametrar mellan maskiner har rapporterats förbruka upp till 80% av träningstiden. För att lätta kommunikationsflaskhalsen utvecklar vi teori och strategier i denna avhandling för att designa kommunikationseffektiva optimeringsalgoritmer. I den första delen tillhandahåller vi enhetliga analysramverk för att analysera prestanda för första ordningens optimeringsalgoritmer med direkt eller felkompenserad komprimering, på en enda maskin och över flera maskiner. Vi skisserar först definitioner av kompressionstekniker som täcker in många kompressorer av praktiskt intresse. Sedan analyserar vi konvergens av första ordningens algoritmer som använder antingen deterministisk eller stokastisk kompression. Våra resultat påvisar den explicita effekten av kompressionsnoggrannhet och asynkrona fördröjningar på steglängd, konvergenshastighet och lösningsnoggrannhet för direktkomprimeringsalgoritmer. Vi vänder sedan vår uppmärksamhet till felkompenserade komprimeringsalgoritmer. Vi utvecklar teoretiska förklaringar till varför felkompenserade komprimeringsalgoritmer uppnår lösningar med godtyckligt högre noggrannhet än direkta komprimeringsalgoritmer. Våra resultat visar starka konvergensgarantier för felkompenserade komprimeringsalgoritmer för distribuerade och federerade inlärningsproblem. I den andra delen tillhandahåller vi flexibla inställningsramverk för att optimera konvergensprestanda för komprimeringsalgoritmer för en mängd olika systemarkitekturer. Vi börjar med att analysera databeroende komplexitet som förklarar varför direktkomprimeringsalgoritmer är mer kommunikationseffektiva än fullprecisionsalgoritmer i praktiken. Denna komplexitet leder till automatiska inställningsstrategier som möjliggör populära komprimeringsalgoritmer på olika kommunikationsnätverk för att maximera både framskridandet av konvergensen mot lösningen och kommunikationseffektiviteten. Vi riktar sedan vår uppmärksamhet mot steglängdsminskningsscheman för att maximera konvergenshastigheten för de algoritmer som använder stokastiska gradienter. Vår analysram baseras på två klasser av system som kännetecknar steglängdernas inverkan på hastigheten av stokastiska gradientalgoritmer. Våra resultat visar att sådana steglängdsscheman gör det möjligt för dessa algoritmer att åtnjuta den optimala hastigheten. Simuleringar av algoritmer i denna avhandling på verkliga problem med referensdatamängder validerar våra teoretiska resultat.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022. p. 233
Series
TRITA-EECS-AVL ; 2022:9
Keywords
Communication efficient learning, Optimization algorithms, Quantization, Error compensation, First-order algorithms, Stochastic gradient descent
National Category
Control Engineering
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-308978 (URN)978-91-8040-134-0 (ISBN)
Public defence
2022-03-09, https://kth-se.zoom.us/j/61018037220, U1, Brinellvägen 26, floor 6, Stockholm, 15:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Not duplicate with Diva 1369374, QC 20220221

Available from: 2022-02-21 Created: 2022-02-17 Last updated: 2022-06-25Bibliographically approved
Berglund, E., Khirirat, S. & Wang, X. (2022). Zeroth-order randomized subspace newton methods. In: 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP): . Paper presented at 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 22-27, 2022, Singapore, SINGAPORE (pp. 6002-6006). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Zeroth-order randomized subspace newton methods
2022 (English)In: 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 6002-6006Conference paper, Published paper (Refereed)
Abstract [en]

Zeroth-order methods have become important tools for solving problems where we have access only to function evaluations. However, the zeroth-order methods only using gradient approximations are n times slower than classical first-order methods for solving n-dimensional problems. To accelerate the convergence rate, this paper proposes the zeroth order randomized subspace Newton (ZO-RSN) method, which estimates projections of the gradient and Hessian by random sketching and finite differences. This allows us to compute the Newton step in a lower dimensional subspace, with small computational costs. We prove that ZO-RSN can attain lower iteration complexity than existing zeroth order methods for strongly convex problems. Our numerical experiments show that ZO-RSN can perform black-box attacks under a more restrictive limit on the number of function queries than the state-of-the-art Hessian-aware zeroth-order method.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
International Conference on Acoustics Speech and Signal Processing ICASSP, ISSN 1520-6149
Keywords
Zeroth-order optimization, sketching techniques, Newton-type method, adversarial black-box attacks, convolutional neural network
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-323013 (URN)10.1109/ICASSP43922.2022.9746483 (DOI)000864187906058 ()2-s2.0-85131246499 (Scopus ID)
Conference
47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 22-27, 2022, Singapore, SINGAPORE
Note

Part of proceedings: ISBN 978-1-6654-0540-9

QC 20230112

Available from: 2023-01-12 Created: 2023-01-12 Last updated: 2023-01-12Bibliographically approved
Khirirat, S., Magnusson, S., Aytekin, A. & Johansson, M. (2021). A Flexible Framework for Communication-Efficient Machine Learning. In: Thirty-Fifth Aaai Conference On Artificial Intelligence, Thirty-Third Conference On Innovative Applications Of Artificial Intelligence And The Eleventh Symposium On Educational Advances In Artificial Intelligence: . Paper presented at 35th AAAI Conference on Artificial Intelligence / 33rd Conference on Innovative Applications of Artificial Intelligence / 11th Symposium on Educational Advances in Artificial Intelligence, FEB 02-09, 2021, ELECTR NETWORK (pp. 8101-8109). Association for the Advancement of Artificial Intelligence, 35
Open this publication in new window or tab >>A Flexible Framework for Communication-Efficient Machine Learning
2021 (English)In: Thirty-Fifth Aaai Conference On Artificial Intelligence, Thirty-Third Conference On Innovative Applications Of Artificial Intelligence And The Eleventh Symposium On Educational Advances In Artificial Intelligence, Association for the Advancement of Artificial Intelligence , 2021, Vol. 35, p. 8101-8109Conference paper, Published paper (Refereed)
Abstract [en]

With the increasing scale of machine learning tasks, it has become essential to reduce the communication between computing nodes. Early work on gradient compression focused on the bottleneck between CPUs and GPUs, but communication-efficiency is now needed in a variety of different system architectures, from high-performance clusters to energy-constrained IoT devices. In the current practice, compression levels are typically chosen before training and settings that work well for one task may be vastly sub-optimal for another dataset on another architecture. In this paper, we propose a flexible framework which adapts the compression level to the true gradient at each iteration, maximizing the improvement in the objective function that is achieved per communicated bit. Our framework is easy to adapt from one technology to the next by modeling how the communication cost depends on the compression level for the specific technology. Theoretical results and practical experiments indicate that the automatic tuning strategies significantly increase communication efficiency on several state-of-the-art compression schemes.

Place, publisher, year, edition, pages
Association for the Advancement of Artificial Intelligence, 2021
Series
AAAI Conference on Artificial Intelligence, ISSN 2159-5399
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-301798 (URN)000680423508025 ()2-s2.0-85121616102 (Scopus ID)
Conference
35th AAAI Conference on Artificial Intelligence / 33rd Conference on Innovative Applications of Artificial Intelligence / 11th Symposium on Educational Advances in Artificial Intelligence, FEB 02-09, 2021, ELECTR NETWORK
Note

Part of proceedings: ISBN 978-1-57735-866-4, QC 20230117

Available from: 2021-09-14 Created: 2021-09-14 Last updated: 2023-01-17Bibliographically approved
Khirirat, S., Magnusson, S. & Johansson, M. (2021). Compressed Gradient Methods With Hessian-Aided Error Compensation. IEEE Transactions on Signal Processing, 69, 998-1011
Open this publication in new window or tab >>Compressed Gradient Methods With Hessian-Aided Error Compensation
2021 (English)In: IEEE Transactions on Signal Processing, ISSN 1053-587X, E-ISSN 1941-0476, Vol. 69, p. 998-1011Article in journal (Refereed) Published
Abstract [en]

The emergence of big data has caused a dramatic shift in the operating regime for optimization algorithms. The performance bottleneck, which used to be computations, is now often communications. Several gradient compression techniques have been proposed to reduce the communication load at the price of a loss in solution accuracy. Recently, it has been shown how compression errors can be compensated for in the optimization algorithm to improve the solution accuracy. Even though convergence guarantees for error-compensated algorithms have been established, there is very limited theoretical support for quantifying the observed improvements in solution accuracy. In this paper, we show that Hessian-aided error compensation, unlike other existing schemes, avoids accumulation of compression errors on quadratic problems. We also present strong convergence guarantees of Hessian-based error compensation for stochastic gradient descent. Our numerical experiments highlight the benefits of Hessian-based error compensation, and demonstrate that similar convergence improvements are attained when only a diagonal Hessian approximation is used.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Signal processing algorithms, Optimization, Error compensation, Compressors, Approximation algorithms, Machine learning algorithms, Convergence, Quantization, gradient methods, stochastic gradient descent
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-291985 (URN)10.1109/TSP.2020.3048229 (DOI)000617753900001 ()2-s2.0-85099112848 (Scopus ID)
Note

QC 20210329

Available from: 2021-03-29 Created: 2021-03-29 Last updated: 2022-06-25Bibliographically approved
Khirirat, S., Wang, X., Magnússon, S. & Johansson, M. (2021). Improved Step-Size Schedules for Noisy Gradient Methods. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): . Paper presented at ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 6-11 June 2021 • Toronto, Ontario, Canada. Toronto, ON, Canada: Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Improved Step-Size Schedules for Noisy Gradient Methods
2021 (English)In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada: Institute of Electrical and Electronics Engineers (IEEE) , 2021Conference paper, Published paper (Refereed)
Abstract [en]

Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order methods and compressed gradient methods. For such methods to converge toward a global optimum, it is intuitive to use large step-sizes in the initial iterations when the noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been con-firmed in theory and practice for stochastic gradient methods, but similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can indeed be applied for a broad class of noisy gradient methods. Unlike previous works, our analysis framework shows that such step-size schedules enable these methods to enjoy an optimal O(1/k) rate. We exemplify our results on zeroth-order methods and stochastic compression methods. Our experiments validate fast convergence of these methods with the step decay schedules.

Place, publisher, year, edition, pages
Toronto, ON, Canada: Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Optimization, machine learning, distributed algorithms, zeroth-order algorithms, quantization
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-295604 (URN)10.1109/ICASSP39728.2021.9414419 (DOI)000704288403182 ()2-s2.0-85115103197 (Scopus ID)
Conference
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 6-11 June 2021 • Toronto, Ontario, Canada
Note

Part of proceedings: ISBN 978-1-7281-7605-5, QC 20230117

Available from: 2021-05-24 Created: 2021-05-24 Last updated: 2023-01-17Bibliographically approved
Khirirat, S., Magnusson, S. & Johansson, M. (2019). CONVERGENCE BOUNDS FOR COMPRESSED GRADIENT METHODS WITH MEMORY BASED ERROR COMPENSATION. In: RTSEKAS D. P., 2011, Optimization for Machine Learning, V2010, P1 ngni J., 2017, arXiv preprint arXiv: 1710. 09854, u Shengyu, 2016, 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGSIEEE International Conference on Acoustics, Speech, and Signal Processing, MAR 20-25, 2016, Shanghai, PEOPLES R CHINA, P4134 (Ed.), 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP): . Paper presented at 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 12-17, 2019, Brighton, ENGLAND (pp. 2857-2861). IEEE
Open this publication in new window or tab >>CONVERGENCE BOUNDS FOR COMPRESSED GRADIENT METHODS WITH MEMORY BASED ERROR COMPENSATION
2019 (English)In: 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) / [ed] RTSEKAS D. P., 2011, Optimization for Machine Learning, V2010, P1 ngni J., 2017, arXiv preprint arXiv: 1710. 09854, u Shengyu, 2016, 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGSIEEE International Conference on Acoustics, Speech, and Signal Processing, MAR 20-25, 2016, Shanghai, PEOPLES R CHINA, P4134, IEEE , 2019, p. 2857-2861Conference paper, Published paper (Refereed)
Abstract [en]

The veritable scale of modern data necessitates information compression in parallel/distributed big-data optimization. Compression schemes using memory-based error compensation have displayed superior performance in practice, however, to date there are no theoretical explanations for these observed advantages. This paper provides the first theoretical support for why such compression schemes yields higher accuracy solutions in optimization. Our results cover both gradient and incremental gradient algorithms for quadratic optimization. Unlike previous works, our theoretical results explicitly quantify the accuracy gains from error compensation, especially for ill-conditioned problems. Finally, the numerical results on linear least-squares problems validate the benefit of error compensation and demonstrate tightness of our convergence guarantees.

Place, publisher, year, edition, pages
IEEE, 2019
Series
International Conference on Acoustics Speech and Signal Processing ICASSP, ISSN 1520-6149
Keywords
Quadratic optimization, quantization, gradient descent, incremental gradient methods
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-261056 (URN)10.1109/ICASSP.2019.8682931 (DOI)000482554003018 ()2-s2.0-85068974918 (Scopus ID)978-1-4799-8131-1 (ISBN)
Conference
44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 12-17, 2019, Brighton, ENGLAND
Note

QC 20191001

Available from: 2019-10-01 Created: 2019-10-01 Last updated: 2022-09-06Bibliographically approved
Khirirat, S. (2019). First-Order Algorithms for Communication Efficient Distributed Learning. (Licentiate dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>First-Order Algorithms for Communication Efficient Distributed Learning
2019 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

Technological developments in devices and storages have made large volumes of data collections more accessible than ever. This transformation leads to optimization problems with massive data in both volume and dimension. In response to this trend, the popularity of optimization on high performance computing architectures has increased unprecedentedly. These scalable optimization solvers can achieve high efficiency by splitting computational loads among multiple machines. However, these methods also incur large communication overhead. To solve optimization problems with millions of parameters, communication between machines has been reported to consume up to 80% of the training time. To alleviate this communication bottleneck, many optimization algorithms with data compression techniques have been studied. In practice, they have been reported to significantly save communication costs while exhibiting almost comparable convergence as the full-precision algorithms. To understand this intuition, we develop theory and techniques in this thesis to design communication-efficient optimization algorithms.

In the first part, we analyze the convergence of optimization algorithms with direct compression. First, we outline definitions of compression techniques which cover many compressors of practical interest. Then, we provide the unified analysis framework of optimization algorithms with compressors which can be either deterministic or randomized. In particular, we show how the tuning parameters of compressed optimization algorithms must be chosen to guarantee performance. Our results show explicit dependency on compression accuracy and delay effect due to asynchrony of algorithms. This allows us to characterize the trade-off between iteration and communication complexity under gradient compression.

In the second part, we study how error compensation schemes can improve the performance of compressed optimization algorithms. Even though convergence guarantees of optimization algorithms with error compensation have been established, there is very limited theoretical support which guarantees improved solution accuracy. We therefore develop theoretical explanations, which show that error compensation guarantees arbitrarily high solution accuracy from compressed information. In particular, error compensation helps remove accumulated compression errors, thus improving solution accuracy especially for ill-conditioned problems. We also provide strong convergence analysis of error compensation on parallel stochastic gradient descent across multiple machines. In particular, the error-compensated algorithms, unlike direct compression, result in significant reduction in the compression error.

Applications of the algorithms in this thesis to real-world problems with benchmark data sets validate our theoretical results.

Abstract [sv]

Utvecklandet av kommunikationsteknologi och datalagring har gjort stora mängder av datasamlingar mer tillgängliga än någonsin. Denna förändring leder till numeriska optimeringsproblem med datamängder med stor skala i volym och dimension. Som svar på denna trend har populariteten för högpresterande beräkningsarkitekturer ökat mer än någonsin tidigare. Skalbara optimeringsverktyg kan uppnå hög effektivitet genom att fördela beräkningsbördan mellan ett flertal maskiner. De kommer dock i praktiken med ett pris som utgörs av betydande kommunikationsomkostnader. Detta orsakar ett skifte i flaskhalsen för prestandan från beräkningar till kommunikation. När lösning av verkliga optimeringsproblem sker med ett stort antal parametrar, dominerar kommunikationen mellan maskiner nästan 80% av träningstiden. För att minska kommunikationsbelastningen, har ett flertal kompressionstekniker föreslagits i litteraturen. Även om optimeringsalgoritmer som använder dessa kompressorer rapporteras vara lika konkurrenskraftiga som sina motsvarigheter med full precision, dras de med en förlust av noggrannhet. För att ge en uppfattning om detta, utvecklar vi i denna avhandling teori och tekniker för att designa kommunikations-effektiva optimeringsalgoritmer som endast använder information med låg precision. I den första delen analyserar vi konvergensen hos optimeringsalgoritmer med direkt kompression. Först ger vi en översikt av kompressionstekniker som täcker in många kompressorer av praktiskt intresse. Sedan presenterar vi ett enhetligt analysramverk för optimeringsalgoritmer med kompressorer, som kan vara antingen deterministiska eller randomiserade. I synnerhet visas val av parametrar i komprimerade optimeringsalgoritmer som avgörs av kompressorns parametrar som garanterar konvergens. Våra konvergensgarantier visar beroende av kompressorns noggrannhet och fördröjningseffekter på grund av asynkronicitet hos algoritmer. Detta låter oss karakterisera avvägningen mellan iterations- och kommunikations-komplexitet när kompression används. I den andra delen studerarvi hög prestanda hos felkompenseringsmetoder för komprimerade optimeringsalgoritmer. Även om konvergensgarantier med felkompensering har etablerats finns det väldigt begränsat teoretiskt stöd för konkurrenskraftiga konvergensgarantier med felkompensering. Vi utvecklar därför teoretiska förklaringar, som visar att användande av felkompensering garanterar godtyckligt hög lösningsnoggrannhet från komprimerad information. I synnerhet bidrar felkompensering till att ta bort ackumulerade kompressionsfel och förbättrar därmed lösningsnoggrannheten speciellt för illa konditionerade kvadratiska optimeringsproblem. Vi presenterar också stark konvergensanalys för felkompensering tillämpat på stokastiska gradientmetoder med ett kommunikationsnätverk innehållande ett flertal maskiner. De felkompenserade algoritmerna resulterar, i motsats till direkt kompression, i betydande reducering av kompressionsfelet. Simuleringar av algoritmer i denna avhandling på verkligaproblem med referensdatamängder validerar våra teoretiska resultat.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2019. p. 106
Series
TRITA-EECS-AVL ; 2019:81
Keywords
Communication efficient learning, Optimization algorithms, Quantization, Error compensation, First-order algorithms, Stochastic gradient descent
National Category
Control Engineering
Research subject
Applied and Computational Mathematics; Applied and Computational Mathematics, Optimization and Systems Theory; Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-263738 (URN)978-91-7873-359-0 (ISBN)
Presentation
2019-12-06, D3, D3 Kungliga Tekniska Högskolan, Lindstedtsvägen 5, KTH, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC20191120

Available from: 2019-11-20 Created: 2019-11-11 Last updated: 2022-09-06Bibliographically approved
Khirirat, S., Johansson, M. & Alistarh, D. (2018). Gradient compression for communication-limited convex optimization. In: 2018 IEEE Conference on Decision and Control (CDC): . Paper presented at 57th IEEE Conference on Decision and Control, CDC 2018; Centre of the Fontainebleau in Miami Beach Miami; United States; 17 December 2018 through 19 December 2018 (pp. 166-171). Institute of Electrical and Electronics Engineers (IEEE), Article ID 8619625.
Open this publication in new window or tab >>Gradient compression for communication-limited convex optimization
2018 (English)In: 2018 IEEE Conference on Decision and Control (CDC), Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 166-171, article id 8619625Conference paper, Published paper (Refereed)
Abstract [en]

Data-rich applications in machine-learning and control have motivated an intense research on large-scale optimization. Novel algorithms have been proposed and shown to have optimal convergence rates in terms of iteration counts. However, their practical performance is severely degraded by the cost of exchanging high-dimensional gradient vectors between computing nodes. Several gradient compression heuristics have recently been proposed to reduce communications, but few theoretical results exist that quantify how they impact algorithm convergence. This paper establishes and strengthens the convergence guarantees for gradient descent under a family of gradient compression techniques. For convex optimization problems, we derive admissible step sizes and quantify both the number of iterations and the number of bits that need to be exchanged to reach a target accuracy. Finally, we validate the performance of different gradient compression techniques in simulations. The numerical results highlight the properties of different gradient compression algorithms and confirm that fast convergence with limited information exchange is possible.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
Series
IEEE Conference on Decision and Control, ISSN 0743-1546
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-245102 (URN)10.1109/CDC.2018.8619625 (DOI)000458114800023 ()2-s2.0-85062195554 (Scopus ID)978-1-5386-1395-5 (ISBN)
Conference
57th IEEE Conference on Decision and Control, CDC 2018; Centre of the Fontainebleau in Miami Beach Miami; United States; 17 December 2018 through 19 December 2018
Note

QC 20190307

Available from: 2019-03-07 Created: 2019-03-07 Last updated: 2024-03-15Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-4473-2011

Search in DiVA

Show all publications