Accelerating Value Iteration with Anchoring
–Neural Information Processing Systems
Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a \mathcal{O}(\gamma k) -rate. Surprisingly, however, the optimal rate for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an \emph{anchoring} mechanism (distinct from Nesterov's acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a \mathcal{O}(1/k) -rate for \gamma\approx 1 or even \gamma 1, while standard VI has rate \mathcal{O}(1) for \gamma\ge 1-1/k, where k is the iteration count.
Neural Information Processing Systems
Jan-19-2025, 18:34:43 GMT
- Technology: