Robust Value Function Approximation Using Bilinear Programming
Petrik, Marek, Zilberstein, Shlomo
–Neural Information Processing Systems
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation thatprovides strong a priori guarantees. In particular, this approach provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NPhard, but this is unavoidable because the Bellman-residual minimization itself is NPhard. We therefore employ and analyze a common approximate algorithm for bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration.Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on a simple benchmark problem.
Neural Information Processing Systems
Dec-31-2009
- Country:
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- Genre:
- Research Report (0.46)
- Technology: