We study the problem of federated learning (FL) in the presence of stragglers, the devices that are intermittently connected to the central server. Although under the newly developed semi-decentralized federated learning (SFL) framework, gradient coding (GC) can be applied to evade the stragglers by letting them relay their locally computed gradients to the central server via non-stragglers, the communication burden of GC in SFL is very heavy. To overcome this drawback, motivated by the communication-optimal exact consensus algorithm (CECA) proposed in the literature, we propose a new communicationefficient semi-decentralized method (COFFEE) in SFL. In each round of COFFEE, the devices take a certain number of steps towards consensus in a decentralized manner with high communication efficiency, and each of them acquires the average of its own gradient and the gradients of its previous neighbors. After that, the non-straggler devices send the obtained average results to the server, which aggregates the received vectors to yield the global model update. The learning performance of the proposed method is analyzed through convergence analysis. Finally, we run simulations to show the superiority of COFFEE over the baseline method, i.e., GC in SFL.
Part of ISBN 9798350348934
QC 20250213