AITopics | optimal state-value function

Collaborating Authors

optimal state-value function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Convex Optimal Value Functions For POSGs

Cunha, Rafael F., Castellini, Jacopo, Peralez, Johan, Dibangoye, Jilles S.

arXiv.org Artificial IntelligenceDec-6-2023

Multi-agent planning and reinforcement learning can be challenging when agents cannot see the state of the world or communicate with each other due to communication costs, latency, or noise. Partially Observable Stochastic Games (POSGs) provide a mathematical framework for modelling such scenarios. This paper aims to improve the efficiency of planning and reinforcement learning algorithms for POSGs by identifying the underlying structure of optimal state-value functions. The approach involves reformulating the original game from the perspective of a trusted third party who plans on behalf of the agents simultaneously. From this viewpoint, the original POSGs can be viewed as Markov games where states are occupancy states, i.e., posterior probability distributions over the hidden states of the world and the stream of actions and observations that agents have experienced so far. This study mainly proves that the optimal state-value function is a convex function of occupancy states expressed on an appropriate basis in all zero-sum, common-payoff, and Stackelberg POSGs.

occupancy state, optimal state-value function, private occupancy state, (14 more...)

arXiv.org Artificial Intelligence

2311.09459

Country:

Europe > France (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.45)

Add feedback

Feasible Policy Iteration

Yang, Yujie, Zheng, Zhilong, Li, Shengbo Eben

arXiv.org Artificial IntelligenceApr-18-2023

Safe reinforcement learning (RL) aims to solve an optimal control problem under safety constraints. Existing $\textit{direct}$ safe RL methods use the original constraint throughout the learning process. They either lack theoretical guarantees of the policy during iteration or suffer from infeasibility problems. To address this issue, we propose an $\textit{indirect}$ safe RL method called feasible policy iteration (FPI) that iteratively uses the feasible region of the last policy to constrain the current policy. The feasible region is represented by a feasibility function called constraint decay function (CDF). The core of FPI is a region-wise policy update rule called feasible policy improvement, which maximizes the return under the constraint of the CDF inside the feasible region and minimizes the CDF outside the feasible region. This update rule is always feasible and ensures that the feasible region monotonically expands and the state-value function monotonically increases inside the feasible region. Using the feasible Bellman equation, we prove that FPI converges to the maximum feasible region and the optimal state-value function. Experiments on classic control tasks and Safety Gym show that our algorithms achieve lower constraint violations and comparable or higher performance than the baselines.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2304.08845

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

The Gambler's Problem and Beyond

Wang, Baoxiang, Li, Shuai, Li, Jiajin, Chan, Siu On

arXiv.org Machine LearningDec-31-2019

We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose their bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, not smooth on any interval, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could lead insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.

bellman equation, optimal value function, value function, (15 more...)

arXiv.org Machine Learning

2001.00102

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Ohio (0.04)
Europe > Hungary (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Research Report (0.50)
Summary/Review (0.34)
Instructional Material (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback