Communication-Efficient Resource Allocation for Wireless Federated Learning Systems

Sammanfattning: The training of machine learning (ML) models usually requires a massive amount of data. Nowadays, the ever-increasing number of connected user devices has benefited the development of ML algorithms by providing large sets of data that can be utilized for model training. As privacy concerns become vital in our society, using private data from user devices for training ML models becomes tricky. Therefore, federated learning (FL) with on-device information processing has been proposed for its advantages in preserving data privacy. FL is a collaborative ML framework where multiple devices participate in training a common global model based on locally available data. Unlike centralized ML architecture wherein the entire set of training data need to be centrally stored, in an FL system, only model parameters are shared between user devices and a parameter server. Federated Averaging (FedAvg) is one of the most representative and baseline FL algorithms, with an iterative process of model broadcasting, local training, and model aggregation. In every iteration, the model aggregation process can start only when all the devices have finished local training. Thus, the duration of one iteration is limited by the slowest device, which is known as the straggler issue. To resolve this commonly observed issue in synchronous FL methods, altering the synchronous procedure to an asynchronous one has been explored in the literature; that is, the server does not need to wait for all the devices to finish local training before conducting updates aggregation. However, to avoid high communication costs and implementation complexity that the existing asynchronous FL methods have brought in, we alternatively propose a new asynchronous FL framework with periodic aggregation. Since the FL process involves information exchanges over a wireless medium, allowing partial participation of devices in transmitting model updates is a common approach to avoid the communication bottleneck. We thus further develop channel-aware data-importance-based scheduling policies, which are theoretically motivated by the convergence analysis of the proposed FL system. In addition, an age-aware aggregation weighting design is proposed to deal with the model update asynchrony among scheduled devices in the considered asynchronous FL system. The effectiveness of the proposed scheme is empirically proved of alleviating the straggler effect and achieving better learning outcomes compared to some state-of-the-art methods. From the perspective of jointly optimizing system efficiency and learning performance, in the rest of the thesis, we consider a scenario of Federated Edge Learning (FEEL) where in addition to the heterogeneity of data and wireless channels, heterogeneous computation capability and energy availability are also taken into account in the scheduling design. Besides, instead of assuming all the local data are available at the beginning of the training process, a more practical scenario where the training data might be generated randomly over time is considered. Hence, considering time-varying local training data, wireless link condition, and computing capability, we formulate a stochastic network optimization problem and propose a dynamic scheduling algorithm for optimizing the learning performance subject to per-round latency requirement and long-term energy constraints. The effectiveness of the proposed design is validated by numerical simulations, showing gains in learning performance and system efficiency compared to alternative methods. 

  Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.