Efficient Exploration and Value Function Generalization in Deterministic Systems Zheng Wen
–Neural Information Processing Systems
We consider the problem of reinforcement learning over episodes of a finitehorizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization.
Neural Information Processing Systems
Mar-13-2024, 20:00:22 GMT
- Country:
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Technology: