Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes
–Neural Information Processing Systems
While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable).
Neural Information Processing Systems
May-24-2025, 00:24:04 GMT