Regularized Softmax Deep Multi-Agent Q-Learning

Oct-9-2024, 11:32:54 GMT–Neural Information Processing Systems

Tackling overestimation in Q -learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q -learning algorithm for cooperative multi-agent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q -Learning, is general and can be applied to any Q -learning based MARL algorithm.

algorithm, overestimation, regularized softmax deep multi-agent q-learning, (1 more...)

Neural Information Processing Systems

Oct-9-2024, 11:32:54 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)