Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

Neural Information Processing Systems 

While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable).