Reviews: Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
–Neural Information Processing Systems
The paper focuses on the important problem of designing optimal algorithms for exploration-exploitation (whose upper-bound matches the lower bound). The paper is not well organized and written. It is difficult to abstract from the mathematical formulation and grasps the key ideas behind the improvement of the regret bound. As far as I understood, the first important component in improving the bound is to use variance dependent confidence intervals (ie Bernstein). Together with the knowledge of H, this allows designing a tighter optimism (Eq.
Neural Information Processing Systems
Jan-26-2025, 02:11:46 GMT
- Technology: