On Value Iteration Convergence in Connected MDPs
Mustafin, Arsenii, Olshevsky, Alex, Paschalidis, Ioannis Ch.
–arXiv.org Artificial Intelligence
This paper establishes that an MDP with a unique optimal policy and ergodic associated transition matrix ensures the convergence of various versions of the Value Iteration algorithm at a geometric rate that exceeds the discount factor {\gamma} for both discounted and average-reward criteria.
arXiv.org Artificial Intelligence
Jun-13-2024
- Country:
- North America > United States > Massachusetts > Suffolk County > Boston (0.05)
- Genre:
- Research Report (0.40)
- Technology: