Online Learning-based Adaptive Beam Switching for 6G Networks: Enhancing Efficiency and Resilience

Natanzi, Seyed Bagher Hashemi, Zhu, Zhicong, Tang, Bo

arXiv.org Artificial Intelligence 

Adaptive beam switching is essential for mission-critical military and commercial 6G networks but faces major challenges from high carrier frequencies, user mobility, and frequent blockages. While existing machine learning (ML) solutions often focus on maximizing instantaneous throughput, this can lead to unstable policies with high signaling overhead. This paper presents an online Deep Reinforcement Learning (DRL) framework designed to learn an operationally stable policy. By equipping the DRL agent with an enhanced state representation that includes blockage history, and a stability-centric reward function, we enable it to prioritize long-term link quality over transient gains. Validated in a challenging 100-user scenario using the Sionna library, our agent achieves throughput comparable to a reactive Multi-Armed Bandit (MAB) baseline. Specifically, our proposed framework improves link stability by approximately 43% compared to a vanilla DRL approach, achieving operational reliability competitive with MAB while maintaining high data rates. This work demonstrates that by reframing the optimization goal towards operational stability, DRL can deliver efficient, reliable, and real-time beam management solutions for next-generation mission-critical networks.