A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees

Apr-6-2023, 15:49:13 GMT–Neural Information Processing Systems

We introduce a new algorithm based on linear programming that approximates the differential value function of an average-cost Markov decision process via a linear combination of pre-selected basis functions. The algorithm carries out a form of cost shaping and minimizes a version of Bellman error. We establish an error bound that scales gracefully with the number of states without imposing the (strong) Lyapunov condition required by its counter- part in [6]. We propose a path-following method that automates selection of important algorithm parameters which represent coun- terparts to the "state-relevance weights" studied in [6]. Over the past few years, there has been a growing interest in linear programming (LP) approaches to approximate dynamic programming (DP).

algorithm, basis function, mdp, (13 more...)

Neural Information Processing Systems

Apr-6-2023, 15:49:13 GMT

Conferences Web Page

Add feedback

Country:
- North America > United States (0.05)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.80)
  - Machine Learning > Reinforcement Learning (0.50)