AITopics | overestimation

Regularized Softmax Deep Multi-Agent Q-Learning

Neural Information Processing SystemsApr-24-2026, 14:57:35 GMT

Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multiagent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.

Add feedback

Log-normality and Skewness of Estimated State/Action Values in Reinforcement Learning

Neural Information Processing SystemsMar-17-2026, 14:38:16 GMT

Under/overestimation of state/action values are harmful for reinforcement learning agents. In this paper, we show that a state/action value estimated using the Bellman equation can be decomposed to a weighted sum of path-wise values that follow log-normal distributions. Since log-normal distributions are skewed, the distribution of estimated state/action values can also be skewed, leading to an imbalanced likelihood of under/overestimation. The degree of such imbalance can vary greatly among actions and policies within a single problem instance, making the agent prone to select actions/policies that have inferior expected return and higher likelihood of overestimation. We present a comprehensive analysis to such skewness, examine its factors and impacts through both theoretical and empirical results, and discuss the possible ways to reduce its undesirable effects.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

a39ab46bf619ada0e90ceed846648a81-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 07:00:29 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Robust Knowledge Transfer in Tiered RL

Neural Information Processing SystemsFeb-16-2026, 07:00:25 GMT

We identify a natural and necessary condition called the "Optimal

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

5bacb12bf81e98e2ee0eed953a23c656-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 07:34:23 GMT

artificial intelligence, imagenetv1 sub-population, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendix

Neural Information Processing SystemsFeb-11-2026, 07:11:46 GMT

Fitting T1-mGPLVM to the binned spike data, we found that the inferred latent state was highly correlated with the true head direction (Figure 5b). Here we make this connection more explicit. As described in the main text, the Lie algebrag of a groupG is a vector space tangent toG at its identity element. However,because the Lie algebra is isomorphic toRn, we have found it convenient in both our exposition and our implementation to work directly with the pair(Rn,ExpG), instead of(g,expG). We begin by noting thatSn is not a Lie group unlessn = 1 or n = 3, thus we can only apply the ReLie framework toS1 and S3.

artificial intelligence, arxivpreprintarxiv, neuron, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Southern District > Eilat (0.04)

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

Boosting Verification of Deep Reinforcement Learning via Piece-Wise Linear Decision Neural Networks

Neural Information Processing SystemsDec-24-2025, 04:21:21 GMT

Formally verifying deep reinforcement learning (DRL) systems suffers from both inaccurate verification results and limited scalability. The major obstacle lies in the large overestimation introduced inherently during training and then transforming the inexplicable decision-making models, i.e., deep neural networks (DNNs), into easy-to-verify models. In this paper, we propose an inverse transform-then-train approach, which first encodes a DNN into an equivalent set of efficiently and tightly verifiable linear control policies and then optimizes them via reinforcement learning. We accompany our inverse approach with a novel neural network model called piece-wise linear decision neural networks (PLDNNs), which are compatible with most existing DRL training algorithms with comparable performance against conventional DNNs. Our extensive experiments show that, compared to DNN-based DRL systems, PLDNN-based systems can be more efficiently and tightly verified with up to $438$ times speedup and a significant reduction in overestimation. In particular, even a complex $12$-dimensional DRL system is efficiently verified with up to 7 times deeper computation steps.

deep reinforcement learning, piece-wise linear decision neural network, verification, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Regularized Softmax Deep Multi-Agent Q-Learning

Neural Information Processing SystemsDec-23-2025, 18:08:14 GMT

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular $Q$-learning algorithm for cooperative multi-agent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent $Q$-Learning, is general and can be applied to any $Q$-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.

electronic proceedings, name change, regularized softmax deep multi-agent q-learning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Log-normality and Skewness of Estimated State/Action Values in Reinforcement Learning

Neural Information Processing SystemsNov-21-2025, 15:28:51 GMT

Under/overestimation of state/action values are harmful for reinforcement learning agents. In this paper, we show that a state/action value estimated using the Bellman equation can be decomposed to a weighted sum of path-wise values that follow log-normal distributions. Since log-normal distributions are skewed, the distribution of estimated state/action values can also be skewed, leading to an imbalanced likelihood of under/overestimation. The degree of such imbalance can vary greatly among actions and policies within a single problem instance, making the agent prone to select actions/policies that have inferior expected return and higher likelihood of overestimation. We present a comprehensive analysis to such skewness, examine its factors and impacts through both theoretical and empirical results, and discuss the possible ways to reduce its undesirable effects.

log-normality and skewness, name change, state action value, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback