Review for NeurIPS paper: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Neural Information Processing Systems 

Additional Feedback: The regret in Lemma 1 is also straightforward. The algorithmic idea is to minimizing the Bellman error. So algorithm novelty is Okay. The key contribution of the paper is the regret bound analysis of the two algorithms, which is mostly built on Taylor series approximation of the value function. Theorem 1 and 2 present the regret bound of the two algorithms, following an analysis similar to UCB, drawing from a Taylor expansion approximation of the (nonlinear) Bellman equation.