Review for NeurIPS paper: Softmax Deep Double Deterministic Policy Gradients

Jan-26-2025, 09:14:54 GMT–Neural Information Processing Systems

Additional Feedback: I have the following questions for the authors to clarify and respond. For the bias definition in Theorems 3 and 4, is E [T (s')] also dependent on \theta {true}? If yes, would this be a reasonable assumption? 2. The authors showed that the proposed estimator can simultaneously reduce over- and under-estimation bias. Such results, however, definitely depend on the choice of \beta. Could you elaborate more on how to choose this parameter?

deep double deterministic policy gradient, neurips paper, theorem 1

Neural Information Processing Systems

Jan-26-2025, 09:14:54 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)