AITopics | learning intrinsic reward

Collaborating Authors

learning intrinsic reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploration-Guided RewardShaping forReinforcementLearningunderSparseRewards

Neural Information Processing SystemsFeb-7-2026, 23:33:56 GMT

We study the problem of reward shaping to accelerate the training process of a reinforcement learning agent.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On Learning Intrinsic Rewards for Policy Gradient Methods

Neural Information Processing SystemsNov-20-2025, 22:12:39 GMT

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.

intrinsic reward function, learning intrinsic reward, reward function, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Industry: Leisure & Entertainment > Games > Computer Games (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

Reviews: On Learning Intrinsic Rewards for Policy Gradient Methods

Neural Information Processing SystemsOct-7-2024, 10:26:11 GMT

This work attempts to use learnable intrinsic rewards in addition to conventional extrinsic reward from the environment to boost the agent performance calculated as conventional returns as the cumulative extrinsic rewards. This work can be seen as a variant of previous reward shaping and auxiliary rewards works but cannot be a more general version because though with more general mathematical form, it loses the consideration of domain knowledges, the key insights of previous works. Compared with its closest related works [Sorg et al. 2010, Guo et al. 2016], the method proposed here can be used in bootstrapping learning agents rather than only planning agents (i.e., Monte Carlo sampling of returns). This work implements an intuitively interesting idea that automatically shaping the rewards to boost learning performance. And compared with closely related previous works, it is more general to be used in modern, well performed learning methods (e.g., A2C and PPO). 2. The presentation is clear and easy to follow.

intrinsic reward parameter, learning intrinsic reward, policy gradient method, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On Learning Intrinsic Rewards for Policy Gradient Methods

Zheng, Zeyu, Oh, Junhyuk, Singh, Satinder

Neural Information Processing SystemsFeb-14-2020, 15:25:57 GMT

intrinsic reward function, learning intrinsic reward, reward function, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

On Learning Intrinsic Rewards for Policy Gradient Methods

Zheng, Zeyu, Oh, Junhyuk, Singh, Satinder

arXiv.org Artificial IntelligenceApr-17-2018

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et.al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.

intrinsic reward, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1804.06459

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback