Efficient Exploration and Value Function Generalization in Deterministic Systems Zheng Wen
–Neural Information Processing Systems
We consider the problem of reinforcement learning over episodes of a finitehorizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization.
Neural Information Processing Systems
Mar-13-2024, 20:00:22 GMT
- Technology: