Learning Methods for Antenna Tilt Optimization

Sammanfattning: The increasing complexity of modern mobile networks poses unprecedented challenges to Mobile Network Operators (MNOs). MNOs need to utilize network resources optimally to satisfy the growing demand of network users in a reliable manner. To this end, algorithms for self-optimization of network parameters are an essential tool to increase network efficiency and reduce capital and operational expense. In particular, the control of the antenna tilt angle in mobile networks provides an effective method for improving network coverage and capacity. In this thesis, we study Remote Electrical Tilt (RET) optimization using learning-based methods. In these methods, the objective is to learn an optimal control policy, adjusting the vertical tilt of base station antennas to jointly maximize network coverage and capacity. Existing learning-based RET optimization methods, mainly rely on trial-and-error learning paradigms that inevitably degrade network performance during exploration phases, or may require an excessively large amount of samples to converge. We address RET optimization in the Contextual Bandit (CB) setting, a powerful sequential decision-making framework that allows to efficiently model and solve the RET optimization problem. Specifically, we focus on two distinct CB settings tackling the above mentioned problems: (i) the offline off-policy learning setting, and (ii) the Best Policy Identification (BPI) setting. In offline off-policy learning, the goal is to learn an improved policy, solely from offline data previously collected by a logging policy. Based on these data, a target policy is derived by minimizing the off-policy estimated risk of the learning policy. In RET optimization, the agent can leverage the vast amount of real-world network data collected by MNOs during network operations. This entails a significant advantage compared to online learning methods in terms of operational safety and performance reliability of the learned policy. We train and evaluate several target policies on real-world network data, showing that the off-policy approach can safely learn improved tilt update policy while providing a higher degree of reliability. In BPI, the goal is to identify an optimal policy with the least possible amount of data samples. We study BPI in Linear Contextual Bandits (LCBs), in which the reward has a convenient linear structure. We devise algorithms learning optimal tilt update policies from existing data (passive learning) or from data actively generated by the algorithms (active learning). For both active and passive learning settings, we derive information-theoretical lower bounds on the number of data samples required by any algorithm returning an approximately optimal policy with a given level of certainty and devise algorithms achieving these fundamental limits. We then show how to effectively model RET optimization in LCBs and demonstrate that our algorithms can produce optimal tilt update policies using much fewer data samples than naive or existing rule-based learning algorithms.With the results obtained in this thesis, we argue that a significant improvement for sample complexity and operational safety can be achieved while learning RET optimization policies in CBs, providing potential for real-world network deployment of learning-based RET policies.