AITopics | policy iteration algorithm

Collaborating Authors

policy iteration algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Value Improved Actor Critic Algorithms

Neural Information Processing SystemsJun-17-2026, 01:37:50 GMT

To learn approximately optimal acting policies for decision problems, modern Actor Critic algorithms rely on deep Neural Networks (DNNs) to parameterize the acting policy and greedification operators to iteratively improve it. The reliance on DNNs suggests an improvement that is gradient based, which is per step much less greedy than the improvement possible by greedier operators such as the greedy update used by Q-learning algorithms. On the other hand, slow changes to the policy can also be beneficial for the stability of the learning process, resulting in a tradeoff between greedification and stability. To better address this tradeoff, we propose to decouple the acting policy from the policy evaluated by the critic. This allows the agent to separately improve the critic's policy (e.g.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: Europe > Netherlands (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

2dace78f80bc92e6d7493423d729448e-Reviews.html

Neural Information Processing SystemsOct-3-2025, 08:13:42 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. It presents a slight modification of the NAC algorithm, where the original algorithm is a special case which is called forgetful NAC. The authors show that forget full Nac and optimistic policy iteration are equivalent. The authors also present a non-optimality result for soft-greedy Gibbs distribution, I.e., the optimal solution is not a fixed point of the policy iteration algorithm. I liked the unified view on both type of algorithms.

algorithm, iteration, policy iteration, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Nevada (0.05)

Genre:

Summary/Review (0.48)
Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)

Add feedback

Geometric Re-Analysis of Classical MDP Solving Algorithms

Mustafin, Arsenii, Pakharev, Aleksei, Olshevsky, Alex, Paschalidis, Ioannis Ch.

arXiv.org Artificial IntelligenceMar-6-2025

We extend a recently introduced geometric interpretation of Markov Decision Processes (MDPs) that provides a new perspective on MDP algorithms and their dynamics. Based on this view, we develop a novel analytical framework that simplifies the proofs of existing results and enables us to derive new ones. Specifically, we analyze the behavior of two classical MDP-solving algorithms: Policy Iteration (PI) and Value Iteration (VI). For each algorithm, we first describe its dynamics in geometric terms and then present an analysis along with several convergence results. We begin by introducing an MDP transformation that modifies the discount factor γ and demonstrate how this transformation improves the convergence properties of both algorithms, provided that it can be applied such that the resulting system remains a regular MDP. Second, we present a new analysis of PI in a 2-state MDP case, showing that the number of iterations required for convergence is bounded by the number of state-action pairs. Finally, we reveal an additional convergence factor in the VI algorithm for cases with a connected optimal policy, which is attributed to an extra rotation component in the VI dynamics.

algorithm, iteration, optimal policy, (16 more...)

arXiv.org Artificial Intelligence

2503.04203

Country: Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

fd5c905bcd8c3348ad1b35d7231ee2b1-Reviews.html

Neural Information Processing SystemsMar-14-2024, 01:07:54 GMT

This paper is published in the context of making Learning from Demonstration more robust when a limited number of demonstrations are available. Many of the low level trajectory learning LfD approaches suffer from fragile policies. This paper proposes to use Reinforcement learning to overcome this limitation. This paper falls squarely in the LfD field and does not tackle Inverse reinforcement learning, i.e. the reward function is assumed to be known to the agent rather than inferred by demonstration. One work with a very similar flavor is that of Smart, W. and Kaelbling, L.P. "Effective Reinfrocement Learning for Mobile Robots" ICRA 2002.

algorithm, demonstration, policy iteration algorithm, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.79)

Add feedback

Quantum Computing Methods for Supply Chain Management

Jiang, Hansheng, Shen, Zuo-Jun Max, Liu, Junyu

arXiv.org Artificial IntelligenceDec-1-2022

Quantum computing is expected to have transformative influences on many domains, but its practical deployments on industry problems are underexplored. We focus on applying quantum computing to operations management problems in industry, and in particular, supply chain management. Many problems in supply chain management involve large state and action spaces and pose computational challenges on classic computers. We develop a quantized policy iteration algorithm to solve an inventory control problem and demonstrative its effectiveness. We also discuss in-depth the hardware requirements and potential challenges on implementing this quantum algorithm in the near term. Our simulations and experiments are powered by \texttt{IBM Qiskit} and the \texttt{qBraid} system.

artificial intelligence, machine learning, supply chain management, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/SEC54971.2022.00059

2209.08246

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > China > Hong Kong (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Supply Chain Management (0.83)

Add feedback

Model-Free Robust Reinforcement Learning with Linear Function Approximation

Panaganti, Kishan, Kalathil, Dileep

arXiv.org Machine LearningOct-7-2020

This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDPs framework is to find a policy that is robust against the parameter uncertainties due to the mismatch between the simulator model and real-world settings. We first propose Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation. We prove the convergence of this algorithm using stochastic approximation techniques. We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy. We also give a general weighted Euclidean norm bound on the error (closeness to optimality) of the resulting policy. Finally, we demonstrate the performance of our RLSPI algorithm on some benchmark problems from OpenAI Gym.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2006.11608

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)

Add feedback

On the convergence of optimistic policy iteration for stochastic shortest path problem

Chen, Yuanlong

arXiv.org Machine LearningAug-29-2018

In this paper, we prove some convergence results of a special case of optimistic policy iteration algorithm for stochastic shortest path problem mentioned in [5] . We consider both Monte Carlo and TD(λ) methods for the policy evaluation step under the condition that termination state will eventually be reached almost surely.

artificial intelligence, machine learning, stochastic shortest path problem, (12 more...)

arXiv.org Machine Learning

1808.08763

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Dissecting Reinforcement Learning-Part.1

#artificialintelligenceApr-8-2017, 20:09:06 GMT

Premise[This post is an introduction to reinforcement learning and it is meant to be the starting point for a reader who already has some machine learning background and is confident with a little bit of math and Python. When I study a new algorithm I always want to understand the underlying mechanisms. In this sense it is always useful to implement the algorithm from scratch using a programming language. I followed this approach in this post which can be long to read but worthy. When I started to study reinforcement learning I did not find any good online resource which explained from the basis what reinforcement learning really is. Most of the (very good) blogs out there focus on the modern approaches (Deep Reinforcement Learning) and introduce the Bellman equation without a satisfying explanation. I turned my attention to books and I found the one of Russel and Norvig called Artificial Intelligence: A Modern Approach. This post is based on chapters 17 of the second edition, and it can be considered an extended review of the chapter. I will use the same mathematical notation of the authors, in this way you can use the book to cover some missing parts or vice versa.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

#artificialintelligence

Country: North America > United States > California > Los Angeles County > Santa Monica (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Solving POMDPs by Searching in Policy Space

Hansen, Eric A.

arXiv.org Artificial IntelligenceJan-30-2013

Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state.

artificial intelligence, finite-state controller, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1301.738

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (0.44)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

On the Complexity of Policy Iteration

Mansour, Yishay, Singh, Satinder

arXiv.org Artificial IntelligenceJan-23-2013

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MD Ps). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such nontrivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.

artificial intelligence, iteration, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1301.6718

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback