Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes

Neural Information Processing Systems 

One key difference between reinforcement learning in simulated vs. real-world environments is that, in most simulated environments, the agent can fully observe