Difference of Convex Functions Programming for Reinforcement Learning
Bilal Piot, Matthieu Geist, Olivier Pietquin
–Neural Information Processing Systems
Large Markov Decision Processes are usually solved using Approximate Dynamic Programming methods such as Approximate Value Iteration or Approximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming.
Neural Information Processing Systems
Feb-9-2025, 03:00:55 GMT
- Country:
- Europe > France
- Hauts-de-France > Nord > Lille (0.04)
- North America > United States
- Massachusetts > Middlesex County > Belmont (0.04)
- Europe > France