Review for NeurIPS paper: Softmax Deep Double Deterministic Policy Gradients
–Neural Information Processing Systems
Additional Feedback: I have the following questions for the authors to clarify and respond. For the bias definition in Theorems 3 and 4, is E [T (s')] also dependent on \theta {true}? If yes, would this be a reasonable assumption? 2. The authors showed that the proposed estimator can simultaneously reduce over- and under-estimation bias. Such results, however, definitely depend on the choice of \beta. Could you elaborate more on how to choose this parameter?
Neural Information Processing Systems
Jan-26-2025, 09:14:54 GMT
- Technology: