Reviews: Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes
–Neural Information Processing Systems
This is an excellent theoretical contribution. The analysis is quite heavy and has many subtleties. I do not have enough time to read the appended proofs; also, the subject of the paper is not in my area of research. The comments below are based on the impression I got after reading carefully the first 8 pages of the paper and glancing through the rest in the supplementary file. Summary: This paper is about reinforcement learning in weakly-communicating MDP under the average-reward criterion.
Neural Information Processing Systems
Oct-7-2024, 08:33:21 GMT
- Technology: