Review for NeurIPS paper: Softmax Deep Double Deterministic Policy Gradients

Neural Information Processing Systems 

Additional Feedback: I have the following questions for the authors to clarify and respond. For the bias definition in Theorems 3 and 4, is E [T (s')] also dependent on \theta {true}? If yes, would this be a reasonable assumption? 2. The authors showed that the proposed estimator can simultaneously reduce over- and under-estimation bias. Such results, however, definitely depend on the choice of \beta. Could you elaborate more on how to choose this parameter?