kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 13) Show all publications
Mahmoudi, A., Zaher, M. & Björnson, E. (2025). Low-Latency and Energy-Efficient Federated Learning Over Cell-Free Networks: A Trade-Off Analysis. IEEE Open Journal of the Communications Society, 6, 2274-2292
Open this publication in new window or tab >>Low-Latency and Energy-Efficient Federated Learning Over Cell-Free Networks: A Trade-Off Analysis
2025 (English)In: IEEE Open Journal of the Communications Society, E-ISSN 2644-125X, Vol. 6, p. 2274-2292Article in journal (Refereed) Published
Abstract [en]

Federated learning (FL) enables distributed model training by exchanging models rather than raw data, preserving privacy and reducing communication overhead. However, as the number of FL users grows, traditional wireless networks with orthogonal access face increasing latency due to limited scalability. Cell-free massive multiple-input multiple-output (CFmMIMO) networks offer a promising solution by allowing many users to share the same time-frequency resources. While CFmMIMO enhances energy efficiency through spatial multiplexing and collaborative beamforming, it remains crucial to adapt its physical layer operation to meticulously allocate uplink transmission powers to the FL users. To this aim, we study the problem of uplink power allocation to maximize the number of global FL iterations while jointly optimizing uplink energy and latency. The key challenge lies in balancing the opposing effects of transmission power: increasing power reduces latency but increases energy consumption, and vice versa. Therefore, we propose two power allocation schemes: one minimizes a weighted sum of uplink energy and latency to manage the trade-off, while the other maximizes the achievable number of FL iterations within given energy and latency constraints. We solve these problems using a combination of Brent's method, coordinate gradient descent, the bisection method, and Sequential Quadratic Programming (SQP) with BFGS updates. Numerical results demonstrate that our proposed approaches outperform state-of-the-art power allocation schemes, increasing the number of achievable FL iterations by up to 62%, 93%, and 142% compared to Dinkelbach, max-sum rate, and joint communication and computation optimization methods, respectively.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
cell-free massive MIMO, energy efficiency, Federated learning, latency, power allocation
National Category
Telecommunications Communication Systems Signal Processing
Identifiers
urn:nbn:se:kth:diva-363118 (URN)10.1109/OJCOMS.2025.3553593 (DOI)001463481300008 ()2-s2.0-105003088680 (Scopus ID)
Note

QC 20250507

Available from: 2025-05-06 Created: 2025-05-06 Last updated: 2025-05-28Bibliographically approved
Mahmoudi Benhangi, A. (2025). Toward Efficient Federated Learning over Wireless Networks: Novel Frontiers in Resource Optimization. (Doctoral dissertation). Stockholm: Kungliga Tekniska högskolan
Open this publication in new window or tab >>Toward Efficient Federated Learning over Wireless Networks: Novel Frontiers in Resource Optimization
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

With the rise of the Internet of Things (IoT) and 5G networks, edge computing addresses critical limitations in cloud computing’s quality of service . Machine learning (ML) has become essential in processing IoT-generated data at the edge, primarily through distributed optimization algorithms that support predictive tasks. However, state-of-the-art ML models demand substantial computational and communication resources, often exceeding the capabilities of wireless devices. Moreover, training these models typically requires centralized access to datasets, but transmitting such data to the cloud introduces significant communication overhead, posing a critical challenge to resource-constrained systems. Federated Learning (FL) is a promising iterative approach that reduces communication costs through local computation on devices, where only model parameters are shared with a central server. Accordingly, every communication iteration of FL experiences costs such as computation, latency, bandwidth, and energy. Although FL enables distributed learning across multiple devices without exchanging raw data, its success is often hindered by the limitations of wireless communication overhead, including traffic congestion, and device resource constraints. To address these challenges, this thesis presents cost-effective methods for making FL training more efficient in resource-constrained wireless environments. Initially, we investigate challenges in distributed training over wireless networks, addressing background traffic and latency that impede communication iterations. We introduce the cost-aware causal FL algorithm (FedCau), which balances training performance with communication and computation costs through a novel iteration-termination method, removing the need for future information. A multi-objective optimization problem is formulated, integrating FL loss and iteration costs, with communication managed via slotted-ALOHA, CSMA/CA, and OFDMA protocols. The framework is extended to include both convex and non-convex loss functions, and results are compared with established communication-efficient methods, including heavily Aggregated Quantized Gradient (LAQ). Additionally, we develop ALAQ(Adaptive LAQ), which conserves energy while maintaining high test accuracy by dynamically adjusting bit allocation for local model updates during iterations . Next, we leverage cell-free massive multiple-input multiple-output (CFm-MIMO) networks to address the high latency in large-scale FL deployments. This architecture allows for simultaneous service to many users on the same time/frequency resources, mitigating the latency bottleneck through spatial multiplexing. Accordingly, we propose optimized uplink power allocation schemes that minimize the trade-off between energy consumption and latency, enabling more iterations under given energy and latency constraints and leading to substantial gains in FL test accuracy. In this regard, we present three approaches, beginning with a method that jointly minimizes the users’ uplink energy and FL training latency. This approach optimizes the trade-off between each user’s uplink latency and energy consumption, factoring in how individual transmit power impacts the energy and latency of other users to jointly reduce overall uplink energy consumption and FL training latency.

Furthermore, to address the straggler effect, we propose an adaptive mixed-resolution quantization scheme for local gradient updates, which considers high resolution only for essential entries and utilizes dynamic power control. Finally, we introduce EFCAQ, an energy-efficient FL in CFmMIMO networks, with a proposed adaptive quantization to co-optimize the straggler effect and the overall user energy consumption while minimizing the FL loss function through an adaptive number of local iterations of users. Through extensive theoretical analysis and experimental validation, this thesis demonstrates that the proposed methods outperform state-of-the-art algorithms across various FL setups and datasets. These contributions pave the way for energy-efficient and low-latency FL systems, making them more practical for use in real-world wireless networks.

Abstract [sv]

Framväxten av sakernas Internet (IoT, Internet of Things) och 5G-nät begränsas av tjänstekvaliteten i molnet, men kantberäkningar kan adressera dessa problem. Maskininlärning (ML) kommer bli avgörande för att bearbeta IoT-genererad data vid kanten av nätet, främst genom att använda distribuerade optimeringsalgoritmer för prediktion. Dagens ML-modeller kräver dock stora beräknings- och kommunikationsresurser som ofta överstiger kapaciteten hos enskilda trådlösa enheter. Dessutom kräver träningen av dessa modeller vanligtvis centraliserad åtkomst till stora datamängder, men överföringen av denna data till molnet har betydande kommunikationskostnader, vilket är en kritisk utmaning för att driva resursbegränsade system. Federerad inlärning (FL) är en lovande iterativ ML-metod som minskar kommunikationskostnaderna genom att genomföra lokala beräkning på lokalt tillgänglig data på enheterna och endast dela modellparametrar med en central server. Varje iteration i FL har vissa kostnader när det gäller beräkningar, latens, bandbredd och energi. Även om FL möjliggör distribuerad inlärning över flera enheter utan att utbyta rådata, begränsas metoden i praktiken av den trådlösa kommunikationstekniken, t.ex. trafikstockningar i nätet och energibegränsningar i enheterna. För att adressera dessa problem presenterar denna avhandling kostnadseffektiva metoder för att göra FL-träning mer effektiv i resursbegränsade trådlösa miljöer.

Inledningsvis löser vi forskningsproblem relaterade till distribuerad inlärning över trådlösa nätverk med fokus på hur annan datatrafik och kommunikationslatensen begränsar FL-iterationerna. Vi introducerar den kostnadsmedvetna kausala FL-algoritmen FedCau som balanserar träningsprestanda mot kommunikations- och beräkningskostnader. En viktig del av lösningen är en ny termineringsmetod som tar bort det tidigare behovet av att ha information om framtida beräkningar vid termineringen. Ett flermålsoptimeringsproblem formuleras för att integrera FL-kostnader med kommunikation som genomförs med ALOHA-, CSMA/CA- eller OFDMA-protokollen. Ramverket omfattar både konvexa och icke-konvexa förlustfunktioner och resultaten jämförs med etablerade kommunikationseffektiva metoder, inklusive Lazily Aggregated Quantized Gradient (LAQ). Dessutom utvecklar vi A-LAQ (adaptivLAQ) som sparar energi samtidigt som hög ML-noggrannhet bibehålls genom att dynamiskt justera bitallokeringen för de lokala modelluppdateringarna under FL-iterationerna.

Därefter analyserar vi hur cellfri massiv multiple-input multiple-output (CFmMIMO) teknik kan användas för att hantera den höga kommunikationslatensen som annars uppstår när storskaliga modeller tränas genom FL. Denna nya nätarkitektur består av många samarbetande basstationer vilket möjliggör att många användare kan skicka modelluppdateringar samtidigt på samma frekvenser genom rumslig multiplexing, vilket drastiskt minskar latensen. Vi föreslår nya upplänkseffektregleringsscheman som optimerar avvägningen mellan energiförbrukning och latens. Denna lösning möjliggör fler FL-iterationer under givna energi- och latensbegränsningar och leder till betydande vinster i FL-testnoggrannheten. Vi presenterar tre tillvägagångssätt varav det första är en metod som minimerar en matematisk avvägningen mellan varje användares upplänkslatens och energiförbrukning. Metoden tar hänsyn till hur de individuella sändningseffekterna påverkar andra användares energi och latens för att gemensamt minska den totala energiförbrukningenoch FL-träningsfördröjningen. Vårt andra bidrag är en metod för att hantera eftersläpningseffekter genom ett adaptivt kvantiseringsschema med blandad upplösning för de lokala gradientuppdateringarna. I detta schema används hög kvantiseringsupplösning endast för viktiga variabler och vi använder även dynamisk effektreglering. Slutligen introducerar vi EFCAQ som är en energieffektiv FL-metod för CFmMIMO-nätverk. EFCAQ kombinerar ett nytt adaptivt kvantiseringsschema med att samoptimera eftersläpningseffekten och användarens totala energiförbrukning så att FL-förlustfunktionen minimeras genom att använda ett adaptivt antal lokala iterationer hos varje användare.

Genom omfattande teoretisk analys och experimentell validering visar denna avhandling att de föreslagna metoderna överträffar tidigare kända algoritmer i olika FL-scenarier och för olika datauppsättningar. Våra bidrag banar väg för energieffektiva FL-system med låg latens, vilket gör dem mer praktiska för användning i verkliga trådlösa nätverk.

Place, publisher, year, edition, pages
Stockholm: Kungliga Tekniska högskolan, 2025. p. xv, 123
Series
TRITA-EECS-AVL ; 2025:13
Keywords
Federated Learning, Optimization, Cell-free massive MIMO, Resource allocation, Energy, Latency, Federerad inlärning, Optimering, Cell-fri massiv MIMO, Resursallokering, Energieffektivitet, Latens
National Category
Communication Systems
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-358334 (URN)978-91-8106-165-9 (ISBN)
Public defence
2025-02-10, https://kth-se.zoom.us/j/69502080036, Ka-sal C, Kistagången 16, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20250115

Available from: 2025-01-15 Created: 2025-01-15 Last updated: 2025-02-18Bibliographically approved
Mahmoudi, A. & Björnson, E. (2024). Adaptive Quantization Resolution and Power Control for Federated Learning over Cell-free Networks. In: : . Paper presented at IEEE Global Communications Conference, 8–12 December 2024, Cape Town, South Africa.
Open this publication in new window or tab >>Adaptive Quantization Resolution and Power Control for Federated Learning over Cell-free Networks
2024 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Federated learning (FL) is a distributed learning framework where users train a global model by exchanging local model updates with a server instead of raw datasets, preserving data privacy and reducing communication overhead. However, the latency grows with the number of users and the model size, impeding the successful FL over traditional wireless networks with orthogonal access. Cell-free massive multiple-input multipleoutput (CFmMIMO) is a promising solution to serve numerous users on the same time/frequency resource with similar rates. This architecture greatly reduces uplink latency through spatial multiplexing but does not take application characteristics into account. In this paper, we co-optimize the physical layer with the FL application to mitigate the straggler effect. We introduce a novel adaptive mixed-resolution quantization scheme of the local gradient vector updates, where only the most essential entries are given high resolution. Thereafter, we propose a dynamic uplink power control scheme to manage the varying user rates and mitigate the straggler effect. The numerical results demonstrate that the proposed method achieves test accuracy comparable to classic FL while reducing communication overhead by at least 93% on the CIFAR-10, CIFAR-100, and Fashion-MNIST datasets. We compare our methods against AQUILA, Top-q, and LAQ, using the max-sum rate and Dinkelbach power control schemes. Our approach reduces the communication overhead by 75% and achieves 10% higher test accuracy than these benchmarks within a constrained total latency budget. 

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-358331 (URN)
Conference
IEEE Global Communications Conference, 8–12 December 2024, Cape Town, South Africa
Note

QC 20250115

Available from: 2025-01-15 Created: 2025-01-15 Last updated: 2025-01-15Bibliographically approved
Mahmoudi, A., Ghadikolaei, H. S., Da Silva Jr, J. M. & Fischione, C. (2024). FedCau: A Proactive Stop Policy for Communication and Computation Efficient Federated Learning. IEEE Transactions on Wireless Communications, 23(9), 11076-11093
Open this publication in new window or tab >>FedCau: A Proactive Stop Policy for Communication and Computation Efficient Federated Learning
2024 (English)In: IEEE Transactions on Wireless Communications, ISSN 1536-1276, E-ISSN 1558-2248, Vol. 23, no 9, p. 11076-11093Article in journal (Refereed) Published
Abstract [en]

This paper investigates efficient distributed training of a Federated Learning (FL) model over a wireless network of wireless devices. The communication iterations of the distributed training algorithm may be substantially deteriorated or even blocked by the effects of the devices' background traffic, packet losses, congestion, or latency. We abstract the communication-computation impacts as an 'iteration cost' and propose a cost-aware causal FL algorithm (FedCau) to tackle this problem. We propose an iteration-termination method that trade-offs the training performance and networking costs. We apply our approach when workers use the slotted-ALOHA, carrier-sense multiple access with collision avoidance (CSMA/CA), and orthogonal frequency-division multiple access (OFDMA) protocols. We show that, given a total cost budget, the training performance degrades as either the background communication traffic or the dimension of the training problem increases. Our results demonstrate the importance of proactively designing optimal cost-efficient stopping criteria to avoid unnecessary communication-computation costs to achieve a marginal FL training improvement. We validate our method by training and testing FL over the MNIST and CIFAR-10 dataset. Finally, we apply our approach to existing communication efficient FL methods from the literature, achieving further efficiency. We conclude that cost-efficient stopping criteria are essential for the success of practical FL over wireless networks.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Costs, Training, Wireless networks, Protocols, Optimization, Machine learning algorithms, Resource management, Federated learning, communication protocols, cost-efficient algorithm, latency, unfolding federated learning
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-354333 (URN)10.1109/TWC.2024.3378351 (DOI)001312963400083 ()2-s2.0-85189318899 (Scopus ID)
Note

QC 20241004

Available from: 2024-10-04 Created: 2024-10-04 Last updated: 2025-01-15Bibliographically approved
Mahmoudi, A., Zaher, M. & Björnson, E. (2024). Joint Energy and Latency Optimization in Federated Learning over Cell-Free Massive MIMO Networks. In: 2024 IEEE Wireless Communications and Networking Conference, WCNC 2024 - Proceedings: . Paper presented at 25th IEEE Wireless Communications and Networking Conference, WCNC 2024, Dubai, United Arab Emirates, Apr 21 2024 - Apr 24 2024. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Joint Energy and Latency Optimization in Federated Learning over Cell-Free Massive MIMO Networks
2024 (English)In: 2024 IEEE Wireless Communications and Networking Conference, WCNC 2024 - Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Federated learning (FL) is a distributed learning paradigm wherein users exchange FL models with a server instead of raw datasets, thereby preserving data privacy and reducing communication overhead. However, the increased number of FL users may hinder completing large-scale FL over wireless networks due to high imposed latency. Cell-free massive multiple-input multiple-output (CFmMIMO) is a promising architecture for implementing FL because it serves many users on the same time/frequency resources. While CFmMIMO enhances energy efficiency through spatial multiplexing and collaborative beamforming, it remains crucial to meticulously allocate uplink transmission powers to the FL users. In this paper, we propose an uplink power allocation scheme in FL over CFmMIMO by considering the effect of each user's power on the energy and latency of other users to jointly minimize the users' uplink energy and the latency of FL training. The proposed solution algorithm is based on the coordinate gradient descent method. Numerical results show that our proposed method outperforms the well-known max-sum rate by increasing up to 27% and max-min energy efficiency of the Dinkelbach method by increasing up to 21 % in terms of test accuracy while having limited uplink energy and latency budget for FL over CFmMIMO.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Cell-free massive MIMO, Energy, Federated learning, Latency, Power allocation
National Category
Telecommunications Communication Systems
Identifiers
urn:nbn:se:kth:diva-350991 (URN)10.1109/WCNC57260.2024.10571236 (DOI)001268569304063 ()2-s2.0-85198830377 (Scopus ID)
Conference
25th IEEE Wireless Communications and Networking Conference, WCNC 2024, Dubai, United Arab Emirates, Apr 21 2024 - Apr 24 2024
Note

Part of ISBN 9798350303582

QC 20240725

Available from: 2024-07-24 Created: 2024-07-24 Last updated: 2024-10-07Bibliographically approved
Mahmoudi, A. (2023). Communication-Computation Efficient Federated Learning over Wireless Networks. (Licentiate dissertation). KTH Royal Institute of Technology
Open this publication in new window or tab >>Communication-Computation Efficient Federated Learning over Wireless Networks
2023 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

With the introduction of the Internet of Things (IoT) and 5G cellular networks, edge computing will substantially alleviate the quality of service shortcomings of cloud computing. With the advancements in edge computing, machine learning (ML) has performed a significant role in analyzing the data produced by IoT devices. Such advancements have mainly enabled ML proliferation in distributed optimization algorithms. These algorithms aim to improve training and testing performance for prediction and inference tasks, such as image classification. However, state-of-the-art ML algorithms demand massive communication and computation resources that are not readily available on wireless devices. Accordingly, a significant need is to extend ML algorithms to wireless communication scenarios to cope with the resource limitations of the devices and the networks. 

Federated learning (FL) is one of the most prominent algorithms with data distributed across devices. FL reduces communication overhead by avoiding data exchange between wireless devices and the server. Instead, each wireless device executes some local computations and communicates the local parameters to the server using wireless communications. Accordingly, every communication iteration of FL experiences costs such as computation, latency, communication resource utilization, bandwidth, and energy. Since the devices' communication and computation resources are limited, it may hinder completing the training of the FL due to the resource shortage. The main goal of this thesis is to develop cost-efficient approaches to alleviate the resource constraints of devices in FL training.

In the first chapter of the thesis, we overview ML and discuss the relevant communication and computation efficient works for training FL models. Next, a comprehensive literature review of cost efficient FL methods is conducted, and the limitations of existing literature in this area are identified. We then present the central focus of our research, which is a causal approach that eliminates the need for future FL information in the design of communication and computation efficient FL. Finally, we summarize the key contributions of each paper within the thesis.

In the second chapter, the thesis presents the articles on which it is based in their original format of publication or submission. A multi-objective optimization problem, incorporating FL loss and iteration cost functions, is proposed where communication between devices and the server is regulated by the slotted-ALOHA wireless protocol. The effect of contention level in the CSMA/CA on the causal solution of the proposed optimization is also investigated. Furthermore, the multi-objective optimization problem is extended to cover general scenarios in wireless communication, including convex and non-convex loss functions. Novel results are compared with well-known communication-efficient methods, such as the lazily aggregated quantized gradients (LAQ), to further improve the communication efficiency in FL over wireless networks.

Abstract [sv]

Med introduktionen av Internet of Things~(IoT) och 5G~cellulära nätverk, kommer edge computing avsevärt att lindra bristerna på tjänstekvaliteten hos molnberäkningar. Med framstegen inom edge computing har maskininlärning~(ML) spelat en betydande roll i att analysera data som produceras av IoT-enheter. Sådana framsteg har huvudsakligen möjliggjort ML-proliferation i distribuerade optimeringsalgoritmer. Dessa algoritmer syftar till att förbättra tränings- och testprestanda för förutsägelse- och slutledningsuppgifter, såsom bildklassificering. Men de senaste ML-algoritmerna kräver enorma kommunikations- och beräkningsresurser som inte är lätt tillgängliga på trådlösa enheter. Följaktligen är ett betydande behov att utöka ML-algoritmer till scenarier för trådlös kommunikation för att klara av resursbegränsningarna hos enheterna och nätverken.

Federated learning~(FL) är en av de mest framträdande algoritmerna med data fördelade över enheter. FL minskar kommunikationskostnader genom att undvika datautbyte mellan trådlösa enheter och servern. Istället utför varje trådlös enhet några lokala beräkningar och kommunicerar de lokala parametrarna till servern med hjälp av trådlös kommunikation. Följaktligen upplever varje kommunikationsiteration av FL kostnader som beräkning, latens, kommunikationsresursanvändning, bandbredd och energi. Eftersom enheternas kommunikations- och beräkningsresurser är begränsade kan det på grund av resursbristen hindra att fullfölja utbildningen av FL. Huvudmålet med denna avhandling är att utveckla kostnadseffektiva metoder för att lindra resursbegränsningarna för enheter i FL-träning.

I det första kapitlet av avhandlingen överblickar vi ML och diskuterar relevanta kommunikations- och beräkningseffektiva arbeten för att träna FL-modeller. Därefter genomförs en omfattande litteraturgenomgång av kostnadseffektiva FL-metoder, och begränsningarna för befintlig litteratur inom detta område identifieras. Vi presenterar sedan det centrala fokuset i vår forskning, vilket är ett kausalt synsätt som eliminerar behovet av framtida FL-information vid utformning av kommunikations- och beräkningseffektiv FL. Slutligen sammanfattar vi de viktigaste bidragen från varje artikel i avhandlingen.

I det andra kapitlet presenterar avhandlingen de artiklar som den bygger på i deras ursprungliga publicerings- eller inlämningsformat. Ett multi-objektiv optimeringsproblem, som inkluderar FL-förlust- och iterationskostnadsfunktioner, föreslås där det trådlösa ALOHA-protokollet med slitsar reglerar kommunikationen mellan enheter och servern. Effekten av konfliktnivån i CSMA/CA på den kausala lösningen av den föreslagna optimeringen undersöks också. Dessutom utökas problemet med optimering av flera mål till att täcka allmänna scenarier inom trådlös kommunikation, inklusive konvexa och icke-konvexa förlustfunktioner. Nya resultat jämförs med välkända kommunikationseffektiva metoder som LAQ för att ytterligare förbättra kommunikationseffektiviteten i FL över trådlösa nätverk.

Med introduktionen av Internet of Things~(IoT) och 5G~cellulära nätverk, kommer edge computing avsevärt att lindra bristerna på tjänstekvaliteten hos molnberäkningar. Med framstegen inom edge computing har maskininlärning~(ML) spelat en betydande roll i att analysera data som produceras av IoT-enheter. Sådana framsteg har huvudsakligen möjliggjort ML-proliferation i distribuerade optimeringsalgoritmer. Dessa algoritmer syftar till att förbättra tränings- och testprestanda för förutsägelse- och slutledningsuppgifter, såsom bildklassificering. Men de senaste ML-algoritmerna kräver enorma kommunikations- och beräkningsresurser som inte är lätt tillgängliga på trådlösa enheter. Följaktligen är ett betydande behov att utöka ML-algoritmer till scenarier för trådlös kommunikation för att klara av resursbegränsningarna hos enheterna och nätverken.

Federated learning~(FL) är en av de mest framträdande algoritmerna med data fördelade över enheter. FL minskar kommunikationskostnader genom att undvika datautbyte mellan trådlösa enheter och servern. Istället utför varje trådlös enhet några lokala beräkningar och kommunicerar de lokala parametrarna till servern med hjälp av trådlös kommunikation. Följaktligen upplever varje kommunikationsiteration av FL kostnader som beräkning, latens, kommunikationsresursanvändning, bandbredd och energi. Eftersom enheternas kommunikations- och beräkningsresurser är begränsade kan det på grund av resursbristen hindra att fullfölja utbildningen av FL. Huvudmålet med denna avhandling är att utveckla kostnadseffektiva metoder för att lindra resursbegränsningarna för enheter i FL-träning.

I det första kapitlet av avhandlingen överblickar vi ML och diskuterar relevanta kommunikations- och beräkningseffektiva arbeten för att träna FL-modeller. Därefter genomförs en omfattande litteraturgenomgång av kostnadseffektiva FL-metoder, och begränsningarna för befintlig litteratur inom detta område identifieras. Vi presenterar sedan det centrala fokuset i vår forskning, vilket är ett kausalt synsätt som eliminerar behovet av framtida FL-information vid utformning av kommunikations- och beräkningseffektiv FL. Slutligen sammanfattar vi de viktigaste bidragen från varje artikel i avhandlingen.

I det andra kapitlet presenterar avhandlingen de artiklar som den bygger på i deras ursprungliga publicerings- eller inlämningsformat. Ett multi-objektiv optimeringsproblem, som inkluderar FL-förlust- och iterationskostnadsfunktioner, föreslås där det trådlösa ALOHA-protokollet med slitsar reglerar kommunikationen mellan enheter och servern. Effekten av konfliktnivån i CSMA/CA på den kausala lösningen av den föreslagna optimeringen undersöks också. Dessutom utökas problemet med optimering av flera mål till att täcka allmänna scenarier inom trådlös kommunikation, inklusive konvexa och icke-konvexa förlustfunktioner. Nya resultat jämförs med välkända kommunikationseffektiva metoder som LAQ för att ytterligare förbättra kommunikationseffektiviteten i FL över trådlösa nätverk.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2023. p. 41
Series
TRITA-EECS-AVL ; 2023:19
National Category
Telecommunications
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-324549 (URN)978-91-8040-498-3 (ISBN)
Presentation
2023-04-21, Sten Velander, Teknikringen 33, floor 4, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20230310

Available from: 2023-03-10 Created: 2023-03-06 Last updated: 2023-04-24Bibliographically approved
Mahmoudi, A., Barros da Silva Jr., J. M., Ghadikolaei, H. S. & Fischione, C. (2022). A-LAQ: Adaptive Lazily Aggregated Quantized Gradient. In: 2022 IEEE GLOBECOM Workshops, GC Wkshps 2022: Proceedings. Paper presented at 2022 IEEE GLOBECOM Workshops, GC Wkshps 2022, Virtual, Online, Brazil, Dec 4 2022 - Dec 8 2022 (pp. 1828-1833). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A-LAQ: Adaptive Lazily Aggregated Quantized Gradient
2022 (English)In: 2022 IEEE GLOBECOM Workshops, GC Wkshps 2022: Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 1828-1833Conference paper, Published paper (Refereed)
Abstract [en]

Federated Learning (FL) plays a prominent role in solving machine learning problems with data distributed across clients. In FL, to reduce the communication overhead of data between clients and the server, each client communicates the local FL parameters instead of the local data. However, when a wireless network connects clients and the server, the communication resource limitations of the clients may prevent completing the training of the FL iterations. Therefore, communication-efficient variants of FL have been widely investigated. Lazily Aggregated Quantized Gradient (LAQ) is one of the promising communication-efficient approaches to lower resource usage in FL. However, LAQ assigns a fixed number of bits for all iterations, which may be communication-inefficient when the number of iterations is medium to high or convergence is approaching. This paper proposes Adaptive Lazily Aggregated Quantized Gradient (A-LAQ), which is a method that significantly extends LAQ by assigning an adaptive number of communication bits during the FL iterations. We train FL in an energy-constraint condition and investigate the convergence analysis for A-LAQ. The experimental results highlight that A-LAQ outperforms LAQ by up to a 50% reduction in spent communication energy and an 11% increase in test accuracy.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
adaptive transmission, communication bits, edge learning, Federated learning, LAQ
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-333440 (URN)10.1109/GCWkshps56602.2022.10008580 (DOI)2-s2.0-85146892229 (Scopus ID)
Conference
2022 IEEE GLOBECOM Workshops, GC Wkshps 2022, Virtual, Online, Brazil, Dec 4 2022 - Dec 8 2022
Note

Part of ISBN 9781665459754

QC 20230802

Available from: 2023-08-02 Created: 2023-08-02 Last updated: 2025-01-15Bibliographically approved
Mahmoudi, A., Shokri-Ghadikolaei, H. & Fischione, C. (2020). Cost-efficient Distributed Optimization In Machine Learning Over Wireless Networks. In: : . Paper presented at IEEE International conference on communications (IEEE ICC)/ Workshop on NOMA for 5G and Beyond, JUN 07-11, 2020, ELECTR NETWORK.
Open this publication in new window or tab >>Cost-efficient Distributed Optimization In Machine Learning Over Wireless Networks
2020 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper addresses the problem of distributed training of a machine learning model over the nodes of a wireless communication network. Existing distributed training methods are not explicitly designed for these networks, which usually have physical limitations on bandwidth, delay, or computation, thus hindering or even blocking the training tasks. To address such a problem, we consider a general class of algorithms where the training is performed by iterative distributed computations across the nodes. We assume that the nodes have some background traffic and communicate using the slotted-ALOHA protocol. We propose an iteration-termination criterion to investigate the trade-off between achievable training performance and the overall cost of running the algorithms. We show that, given a total running budget, the training performance becomes worse as either the background communication traffic or the dimension of the training problem increases. We conclude that a co-design of distributed optimization algorithms and communication protocols is essential for the success of machine learning over wireless networks and edge computing.

Series
IEEE International Conference on Communications, ISSN 1550-3607
Keywords
Distributed optimization, efficient algorithm, latency, convergence, machine learning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-277809 (URN)10.1109/ICC40277.2020.9149216 (DOI)000606970303133 ()2-s2.0-85089439680 (Scopus ID)
Conference
IEEE International conference on communications (IEEE ICC)/ Workshop on NOMA for 5G and Beyond, JUN 07-11, 2020, ELECTR NETWORK
Note

QC 20200702

Available from: 2020-06-29 Created: 2020-06-29 Last updated: 2023-03-06Bibliographically approved
Mahmoudi, A., Ghadikolaei, H. S. & Fischione, C. (2020). Machine Learning over Networks: Co-design of Distributed Optimization and Communications. In: Proceedings of the 21st IEEE International Workshop on Signal Processing Advances in Wireless Communications, SPAWC 2020: . Paper presented at 21st IEEE International Workshop on Signal Processing Advances in Wireless Communications, SPAWC 2020; Atlanta; United States; 26 May 2020 through 29 May 2020. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Machine Learning over Networks: Co-design of Distributed Optimization and Communications
2020 (English)In: Proceedings of the 21st IEEE International Workshop on Signal Processing Advances in Wireless Communications, SPAWC 2020, Institute of Electrical and Electronics Engineers (IEEE), 2020Conference paper, Published paper (Refereed)
Abstract [en]

This paper considers a general class of iterative algorithms performing a distributed training task over a network where the nodes have background traffic and communicate through a shared wireless channel. Focusing on the carrier-sense multiple access with collision avoidance (CSMA/CA) as the main communication protocol, we investigate the mini-batch size and convergence of the training algorithm as a function of the communication protocol and network settings. We show that, given a total latency budget to run the algorithm, the training performance becomes worse as either the background traffic or the dimension of the training problem increases. We then propose a lightweight algorithm to regulate the network congestion at every node, based on local queue size with no explicit signaling with other nodes, and demonstrate the performance improvement due to this algorithm. We conclude that a co-design of distributed optimization algorithms and communication protocols is essential for the success of machine learning over wireless networks and edge computing.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2020
Series
IEEE International Workshop on Signal Processing Advances in Wireless Communications, ISSN 2325-3789
Keywords
Distributed optimization, machine learning, efficient algorithm, latency, CSMA/CA
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-292378 (URN)10.1109/SPAWC48557.2020.9154264 (DOI)000620337500062 ()2-s2.0-85090398486 (Scopus ID)
Conference
21st IEEE International Workshop on Signal Processing Advances in Wireless Communications, SPAWC 2020; Atlanta; United States; 26 May 2020 through 29 May 2020
Note

QC 20230307

Available from: 2021-04-14 Created: 2021-04-14 Last updated: 2025-01-15Bibliographically approved
Mahmoudi, A., Xiao, M. & Björnson, E. Accelerating Energy-Efficient Federated Learning in Cell-Free Networks with Adaptive Quantization.
Open this publication in new window or tab >>Accelerating Energy-Efficient Federated Learning in Cell-Free Networks with Adaptive Quantization
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Federated Learning (FL) enables clients to share learning parameters instead of local data, reducing communication overhead. Traditional wireless networks face latency challenges with FL. In contrast, Cell-Free Massive MIMO (CFmMIMO) can serve multiple clients on shared resources, boosting spectral efficiency and reducing latency for large-scale FL. However, clients' communication resource limitations can hinder the completion of the FL training. To address this challenge, we propose an energy-efficient, low-latency FL framework featuring optimized uplink power allocation for seamless client-server collaboration. Our framework employs an adaptive quantization scheme, dynamically adjusting bit allocation for local gradient updates to reduce communication costs. We formulate a joint optimization problem covering FL model updates, local iterations, and power allocation, solved using sequential quadratic programming (SQP) to balance energy and latency. Additionally, clients use the AdaDelta method for local FL model updates, enhancing local model convergence compared to standard SGD, and we provide a comprehensive analysis of FL convergence with AdaDelta local updates. Numerical results show that, within the same energy and latency budgets, our power allocation scheme outperforms the Dinkelbach and max-sum rate methods by increasing the test accuracy up to 7\% and 19\%, respectively. Moreover, for the three power allocation methods, our proposed quantization scheme outperforms AQUILA and LAQ by increasing test accuracy by up to 36\% and 35\%, respectively. 

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-358333 (URN)10.48550/arXiv.2412.20785 (DOI)
Note

QC 20250115

Available from: 2025-01-15 Created: 2025-01-15 Last updated: 2025-01-15Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-8826-2088

Search in DiVA

Show all publications