Accelerating Value Iteration with Anchoring

Jan-19-2025, 18:34:43 GMT–Neural Information Processing Systems

Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a \mathcal{O}(\gamma k) -rate. Surprisingly, however, the optimal rate for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an \emph{anchoring} mechanism (distinct from Nesterov's acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a \mathcal{O}(1/k) -rate for \gamma\approx 1 or even \gamma 1, while standard VI has rate \mathcal{O}(1) for \gamma\ge 1-1/k, where k is the iteration count.

accelerating value iteration, anchoring, mechanism, (3 more...)

Neural Information Processing Systems

Jan-19-2025, 18:34:43 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.43)