AITopics | deep double deterministic policy gradient

Collaborating Authors

deep double deterministic policy gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Softmax Deep Double Deterministic Policy Gradients

Neural Information Processing SystemsDec-24-2025, 06:28:25 GMT

A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance. Although the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm mitigates the overestimation issue, it can lead to a large underestimation bias. In this paper, we propose to use the Boltzmann softmax operator for value function estimation in continuous control. We first theoretically analyze the softmax operator in continuous action space. Then, we uncover an important property of the softmax operator in actor-critic algorithms, i.e., it helps to smooth the optimization landscape, which sheds new light on the benefits of the operator. We also design two new algorithms, Softmax Deep Deterministic Policy Gradients (SD2) and Softmax Deep Double Deterministic Policy Gradients (SD3), by building the softmax operator upon single and double estimators, which can effectively improve the overestimation and underestimation bias. We conduct extensive experiments on challenging continuous control tasks, and results show that SD3 outperforms state-of-the-art methods.

deep double deterministic policy gradient, deterministic policy gradient, policy gradient, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Review for NeurIPS paper: Softmax Deep Double Deterministic Policy Gradients

Neural Information Processing SystemsJan-26-2025, 09:14:54 GMT

Additional Feedback: I have the following questions for the authors to clarify and respond. For the bias definition in Theorems 3 and 4, is E [T (s')] also dependent on \theta {true}? If yes, would this be a reasonable assumption? 2. The authors showed that the proposed estimator can simultaneously reduce over- and under-estimation bias. Such results, however, definitely depend on the choice of \beta. Could you elaborate more on how to choose this parameter?

deep double deterministic policy gradient, neurips paper, theorem 1

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Review for NeurIPS paper: Softmax Deep Double Deterministic Policy Gradients

Neural Information Processing SystemsJan-26-2025, 09:14:46 GMT

The reviewers appreciate the simple idea brought up in the paper and the experiments designed to understand its effect and the theoretical justification. Some reviewers did express concerns regarding the significance of the theoretical results and the concerns remain after the rebuttal. Please try to incorporate these feedback in your final draft.

deep double deterministic policy gradient, neurips paper, reviewer

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Add feedback

Softmax Deep Double Deterministic Policy Gradients

Neural Information Processing SystemsOct-10-2024, 17:27:33 GMT

deep double deterministic policy gradient, deterministic policy gradient, policy gradient, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback