Review for NeurIPS paper: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret
–Neural Information Processing Systems
Additional Feedback: The regret in Lemma 1 is also straightforward. The algorithmic idea is to minimizing the Bellman error. So algorithm novelty is Okay. The key contribution of the paper is the regret bound analysis of the two algorithms, which is mostly built on Taylor series approximation of the value function. Theorem 1 and 2 present the regret bound of the two algorithms, following an analysis similar to UCB, drawing from a Taylor expansion approximation of the (nonlinear) Bellman equation.
near-optimal risk-sample tradeoff, neurips paper, risk-sensitive reinforcement learning, (3 more...)
Neural Information Processing Systems
Feb-12-2025, 02:39:58 GMT
- Technology: