Efficient Exploration and Value Function Generalization in Deterministic Systems Zheng Wen

Neural Information Processing Systems 

We consider the problem of reinforcement learning over episodes of a finitehorizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization.