Beyond dynamic programming
–arXiv.org Artificial Intelligence
In contrast with classical dynamic programming-based methods, our method can search over non-stationary policy functions, and can directly compute optimal infinite horizon action sequences from a given state. The central idea in our method is the construction of a mapping between infinite horizon action sequences and real numbers in a bounded interval. This construction enables us to formulate an optimization problem for directly computing optimal infinite horizon action sequences, without requiring a policy function. We demonstrate the effectiveness of our approach by applying it to nonlinear optimal control problems. Overall, our contributions provide a novel theoretical framework for formulating and solving reinforcement learning problems.
arXiv.org Artificial Intelligence
Jun-26-2023
- Country:
- North America > Canada > Ontario > Toronto (0.14)
- Genre:
- Workflow (0.80)
- Research Report (0.50)
- Industry:
- Education (0.34)
- Technology: