A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network Optimization