AITopics | policy invariance

Collaborating Authors

policy invariance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Potential-based Exploration in Reinforcement Learning using Inverse Dynamic Bisimulation Metric

Neural Information Processing SystemsFeb-15-2026, 05:07:01 GMT

While a number of RL methods have been proposed to boost exploration by designing an intrinsic reward signal as exploration bonus.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Macao (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Hong Kong (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

2bba9f4124283edd644799e0cecd45ca-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 22:35:24 GMT

iteration, policy invariance, reward function, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

79f7f00cbe3003cea4d0c2326b4c0b42-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 23:27:46 GMT

artificial intelligence, exploration, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Macao (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Hong Kong (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

2bba9f4124283edd644799e0cecd45ca-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 13:34:00 GMT

We thank all the reviewers for their constructive feedback. We address the key questions and concerns below. This is shown in Eq. 1 below. Therefore, this is not a valid counterexample to ρ -projection's handling of other forms of policy invariance. The ESOR values in Table 1 shows the number of iterations taken to reach expert's ESOR. However, they differ in the type of query used.

artificial intelligence, iteration, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Useful Policy Invariant Shaping from Arbitrary Advice

Behboudian, Paniz, Satsangi, Yash, Taylor, Matthew E., Harutyunyan, Anna, Bowling, Michael

arXiv.org Artificial IntelligenceNov-2-2020

Reinforcement learning is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary RL research is to discover how to learn with less data. Previous work has shown that domain information can be successfully used to shape the reward; by adding additional reward information, the agent can learn with much less data. Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered. While such potential-based reward shaping (PBRS) holds promise, it is limited by the need for a well-defined potential function. Ideally, we would like to be able to take arbitrary advice from a human or other agent and improve performance without affecting the optimal policy. The recently introduced dynamic potential based advice (DPBA) method tackles this challenge by admitting arbitrary advice from a human or other agent and improves performance without affecting the optimal policy. The main contribution of this paper is to expose, theoretically and empirically, a flaw in DPBA. Alternatively, to achieve the ideal goals, we present a simple method called policy invariant explicit shaping (PIES) and show theoretically and empirically that PIES succeeds where DPBA fails.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2011.01297

Country: North America > Canada > Alberta (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Expressing Arbitrary Reward Functions as Potential-Based Advice

Harutyunyan, Anna (Vrije Universiteit Brussel) | Devlin, Sam (University of York) | Vrancx, Peter (Vrije Universiteit Brussel) | Nowe, Ann (Vrije Universiteit Brussel)

AAAI ConferencesMar-6-2015

Effectively incorporating external advice is an important problem in reinforcement learning, especially as it moves into the real world. Potential-based reward shaping is a way to provide the agent with a specific form of additional reward, with the guarantee of policy invariance. In this work we give a novel way to incorporate an arbitrary reward function with the same guarantee, by implicitly translating it into the specific form of dynamic advice potentials, which are maintained as an auxiliary value function learnt at the same time. We show that advice provided in this way captures the input reward function in expectation, and demonstrate its efficacy empirically.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Invariance under Reward Transformations for General-Sum Stochastic Games

Lu, X., Schwartz, H. M., Givigi, S. N.

Journal of Artificial Intelligence ResearchJul-29-2011

We extend the potential-based shaping method from Markov decision processes to multi-player general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.

equilibrium policy, matrix game, stochastic game, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3384

AI Access Foundation

10715

Journal of Artificial Intelligence Research

Country: