Difference of Convex Functions Programming for Reinforcement Learning

Mar-13-2024, 08:30:29 GMT–Neural Information Processing Systems

Large Markov Decision Processes are usually solved using Approximate Dynamic Programming methods such as Approximate Value Iteration or Approximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming.

algorithm, empirical norm, minimization, (15 more...)

Neural Information Processing Systems

Mar-13-2024, 08:30:29 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County > Belmont (0.04)
- Europe > France
  - Hauts-de-France > Nord > Lille (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning (0.93)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.35)