Review for NeurIPS paper: Agnostic Q -learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity
–Neural Information Processing Systems
Weaknesses: The proof, as described by the authors themselves, depend on the assumption on the gap optimality. The relationship between the approximation error and this optimality gap is crucial, a larger approximation error requires a larger gap to ensure the favorable properties. It is not entirely clear whether these bounds are meaningful in practice. Secondly, the algorithm for the general case requires an oracle to determine the most uncertain action given a state for the approximation family F. While it is argued that a similar oracle is used in previous work, it is not clear whether this is more realistic than previous work dismissed by the authors in related work ("Know-What-It-Knows" oracle in Li et al. 2011). The proof applies only to deterministic systems, restricting its application significantly.
Neural Information Processing Systems
Feb-8-2025, 16:23:24 GMT
- Technology: