Difference of Convex Functions Programming for Reinforcement Learning

Neural Information Processing Systems 

Large Markov Decision Processes are usually solved using Approximate Dynamic Programming methods such as Approximate Value Iteration or Approximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming.