Federated Multi-Agent Reinforcement Learning for Privacy-Preserving and Energy-Aware Resource Management in 6G Edge Networks

Andong, Francisco Javier Esono Nkulu, Min, Qi

arXiv.org Artificial Intelligence 

Abstract--As sixth-generation (6G) networks move toward ultra-dense, intelligent edge environments, efficient resource management under stringent privacy, mobility, and energy constraints becomes critical. This paper introduces a novel Federated Multi-Agent Reinforcement Learning (Fed-MARL) framework that incorporates cross-layer orchestration of both the MAC layer and application layer for energy-efficient, privacy-preserving, and real-time resource management across heterogeneous edge devices. Each agent uses a Deep Recurrent Q-Network (DRQN) to learn decentralized policies for task offloading, spectrum access, and CPU energy adaptation based on local observations (e.g., queue length, energy, CPU usage, and mobility). T o protect privacy, we introduce a secure aggregation protocol based on elliptic-curve Diffie-Hellman key exchange, which ensures accurate model updates without exposing raw data to semi-honest adversaries. We formulate the resource management problem as a partially observable multi-agent Markov decision process (POMMDP) with a multi-objective reward function that jointly optimizes latency, energy efficiency, spectral efficiency, fairness, and reliability under 6G-specific service requirements such as URLLC, eMBB, and mMTC. Simulation results demonstrate that Fed-MARL outperforms centralized MARL and heuristic baselines in task success rate, latency, energy efficiency, and fairness, while ensuring robust privacy protection and scalability in dynamic, resource-constrained 6G edge networks. Sixth-generation (6G) wireless networks are poised to transform communication systems by enabling ultra-dense connectivity, low-latency services, and intelligent edge processing capabilities [1]. These advances are critical for emerging applications such as autonomous driving, augmented reality, and massive Internet of Things (IoT) deployments, each imposing diverse and stringent quality-of-service (QoS) requirements [2], [3]. Efficiently meeting these demands requires decentralized, real-time resource management frameworks capable of operating in highly dynamic, interference-prone, and energy-constrained environments under strict privacy conditions. Traditional centralized resource management architectures, which depend on global network knowledge for task offload-ing, spectrum allocation, and computational scheduling, face significant limitations in 6G contexts [4], [5]. These include scalability bottlenecks, latency, communication overhead, and privacy concerns, particularly when raw user data must be aggregated [6].