Review for NeurIPS paper: Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
–Neural Information Processing Systems
Additional Feedback: Post-rebuttal The authors addressed some of my concerns. As the authors would redesign some of the experiments in the revision, I'd raise my score to 6. Comments and questions: 1. Are there any lower bound results on the sample complexity of planning? Are there any particular reasons, and what is the high-level idea of this algorithm? If I understand correctly this rule is to get the gap-dependent sample complexity. What if we use the simple greedy policy for the first action, and what will go wrong in the proof?
Neural Information Processing Systems
Jan-21-2025, 13:04:00 GMT
- Technology: