Review for NeurIPS paper: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

Jan-27-2025, 14:50:12 GMT–Neural Information Processing Systems

Weaknesses: (W1): As such the high-level outline of the proof strategy follows previous procedures for drift analysis in (Yu et al. 2017) and MDP analysis in (Neu et al. 2012 and Rosenberg et al. 2019). Lemma B.2 is very similar to Lemma 4 in Neu et al. 2012 and Lemma B.2 in Rosenberg et al. 2019. Lemma 5.2 mirrors Lemma 8 in Yu et al. 2017. Technical lemmas for stochastic analysis are also from the previous paper: (Lemma B.6 and B.7 are Lemma 5 and 9 in Yu et al. 2017). The main lemma, Lemma 5.3, has the same goal as Lemma 7 in Yu et al. 2017, which is to show Q_t satisfies the drift condition stated in Lemma 5 in Yu et al. 2017. Lemma 5.6 is also exact as Lemma 3 in Yu et al. 2017.

budget violation, lemma 5, upper confidence primal-dual reinforcement learning, (12 more...)

Neural Information Processing Systems

Jan-27-2025, 14:50:12 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)