Reviews: Explicit Explore-Exploit Algorithms in Continuous State Spaces
–Neural Information Processing Systems
I have read the authors feedback and other reviews. I'll keep my original score. The agent is able to collect data when running the algorithm and the goal is to find a near optimal policy. Using ranks of error matrices to represent complexity and some proof techniques are related to [18] and [41]. The paper is technically sound and clearly written. In the theoretical side, the authors prove a polynomial sample complexity bound in terms of A, H, and the rank of the model misfit matrix, thus avoiding the dependence on S .
Neural Information Processing Systems
Jan-21-2025, 06:53:15 GMT