In this paper, we consider the problem of federated learning (FL) with devices that have intermittent connectivity to the central server. For this problem, the concept of semi-decentralized FL has been proposed in the literature. This paradigm allows non-straggler devices to relay the gradients computed by the stragglers to the server, and enables realization of gradient coding (GC) to mitigate the negative impact of the stragglers that fail to communicate directly to the central server. However, for GC in semi-decentralized FL, the communication overhead caused by information transmission among the devices is significant. To overcome this shortcoming, inspired by the existing communication-optimal exact consensus algorithm (CECA), we propose a new communication-efficient semi-decentralized FL method (COFFEE). In each round, the devices exchange information by taking a certain number of steps towards communication-optimal exact consensus, ensuring that each device obtains the average of the gradients computed by both its previous neighbors and itself. Afterwards, the non-stragglers transmit the local average result to the server for global aggregation to update the global model. We analyze the convergence performance and the communication overhead of COFFEE analytically. Building on this, to further enhance learning performance under a specific communication overhead, we propose an enhanced version of COFFEE with an adaptive aggregation rule at the central server, referred to as A-COFFEE, which adjusts to the straggler pattern of the devices over training rounds. Experiments are conducted to verify that the proposed methods outperform the baseline methods.
QC 20250922