Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes