Rank-One Modified Value Iteration

Kolarijani, Arman Sharifi, Ok, Tolga, Esfahani, Peyman Mohajerin, Kolarijani, Mohamad Amin Sharif

May-27-2025–arXiv.org Machine Learning

In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

May-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Ontario > Toronto (0.14)
- Europe > Netherlands
  - South Holland > Delft (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Industry:
- Education > Focused Education > Special Education (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (0.57)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found