AITopics | reward

Collaborating Authors

reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis

Neural Information Processing SystemsDec-23-2025, 20:01:35 GMT

Each year, expert-level performance is attained in increasingly-complex multiagent domains, where notable examples include Go, Poker, and StarCraft II. This rapid progression is accompanied by a commensurate need to better understand how such agents attain this performance, to enable their safe deployment, identify limitations, and reveal potential means of improving them. In this paper we take a step back from performance-focused multiagent learning, and instead turn our attention towards agent behavior analysis. We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or policies, and is trained using only offline observational data. We illustrate the effectiveness of our method for enabling the coupled understanding of behaviors at the joint and local agent level, detection of behavior changepoints throughout training, discovery of core behavioral concepts, demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo control domain, and also illustrate that the approach can disentangle previously-trained policies in OpenAI's hide-and-seek domain.

hierarchical perspective, name change, offline multiagent behavioral analysis, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals

Neural Information Processing SystemsDec-23-2025, 17:33:50 GMT

High sample complexity has long been a challenge for RL. On the other hand, humans learn to perform tasks not only from interaction or demonstrations, but also by reading unstructured text documents, e.g., instruction manuals. Instruction manuals and wiki pages are among the most abundant data that could inform agents of valuable features and policies or task-specific environmental dynamics and reward structures. Therefore, we hypothesize that the ability to utilize human-written instruction manuals to assist learning policies for specific tasks should lead to a more efficient and better-performing agent. We propose the Read and Reward framework. Read and Reward speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers. Our framework consists of a QA Extraction module that extracts and summarizes relevant information from the manual and a Reasoning module that evaluates object-agent interactions based on information from the manual. An auxiliary reward is then provided to a standard A2C RL agent, when interaction is detected. Experimentally, various RL algorithms obtain significant improvement in performance and training speed when assisted by our design.

name change, play atari, read and reap, (10 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

Learning One Representation to Optimize All Rewards

Neural Information Processing SystemsDec-23-2025, 16:39:23 GMT

We introduce the forward-backward (FB) representation of the dynamics of a reward-free Markov decision process. It provides explicit near-optimal policies for any reward specified a posteriori. During an unsupervised phase, we use reward-free interactions with the environment to learn two representations via off-the-shelf deep learning methods and temporal difference (TD) learning. In the test phase, a reward representation is estimated either from reward observations or an explicit reward description (e.g., a target state). The optimal policy for thatreward is directly obtained from these representations, with no planning.

name change, optimize, representation, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Neural Information Processing SystemsJun-1-2025, 23:54:00 GMT

This paper introduces a new complexity measure for MDPs called maximum expected hitting cost. Unlike the diameter measure is only a function of the transition dynamics, this new measure takes into account the reward dynamics as well. The authors show theoretically that under the same assumptions as previous authors who introduced diameter, this new measure is a tighter upper bound. Furthermore, they show the usefulness of this measure by showing that it can be used to better understand the informativeness of rewards when using potential based reward shaping and they prove theoretically that in a large class of MDPs potential based reward shaping is bounded by a multiplicative factor of 2 on their maximum expected hitting costs. I enjoyed reading this paper. I appreciated the structure that the authors used in this paper which first introduced all the necessary prior work (related to diameter) cosily but thoroughly enough before introducing their contributions.

contribution, diameter, markov decision process and informativeness, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.52)

Technology:

Information Technology > Decision Support Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Neural Information Processing SystemsJun-1-2025, 23:53:49 GMT

The paper introduces a new complexity measure for MDPs, the expected hitting costs. In contrast to former complexity measures, the hitting costs also depend on the reward of the MDP and can provide a tighter bound for UCRL2. The theory also provides an intersting connection between reward shapeing and the complexity of a MDP. All reviewers appreciated the strong theoretical contribution of the paper which improves our theoretical understanding of the complexity of MDPs. The reviewers also liked that the paper is well written and establishes connections to reward shaping, a method that has also a highly practical value. All reviewers recommend acceptance and I agree with their assessment.

complexity measure, hitting cost, markov decision process and informativeness, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Rule Based Rewards for Language Model Safety

Neural Information Processing SystemsMay-27-2025, 15:43:49 GMT

Reinforcement learning based fine-tuning of large language models (LLMs) on human preferences has been shown to enhance both their capabilities and safety behavior. However, in cases related to safety, without precise instructions to human annotators, the data collected may cause the model to become overly cautious, or to respond in an undesirable style, such as being judgmental. Additionally, as model capabilities and usage patterns evolve, there may be a costly need to add or relabel data to modify safety behavior. We propose a novel preference modeling approach that utilizes AI feedback and only requires a small amount of human data. Our method, Rule Based Rewards (RBR), uses a collection of rules for desired or undesired behaviors (e.g.

language model safety, reward, rule, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.64)

Add feedback

Reward Is Not Enough for Risk-Averse Reinforcement Learning

#artificialintelligenceNov-26-2022, 00:10:18 GMT

TL;DR: Risk-aversion is essential in many RL applications (e.g., driving, robotic surgery and finance). Some modified RL frameworks consider risk (e.g., by optimizing a risk-measure of the return instead of its expectation), but pose new algorithmic challenges. Instead, it is often suggested to stick with the old and good RL framework, and just set the rewards such that negative outcomes are amplified. Unfortunately, as discussed below, modeling risk using expectation over redefined rewards is often unnatural, impractical or even mathematically impossible, hence cannot replace explicit optimization of risk-measures. This is consistent with similar results from decision theory, where risk optimization is not equivalent to expected utility maximization.

machine learning, optimization, reinforcement learning, (18 more...)

#artificialintelligence

Industry:

Banking & Finance (0.71)
Health & Medicine (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

The National Science Foundation Workshop on Reinforcement Learning

AI MagazineJan-4-2018, 13:31:34 GMT

Reinforcement learning has become one of the most actively studied learning frameworks in the area of intelligent autonomous agents. This article describes the results of a three-day meeting of leading researchers in this area that was sponsored by the National Science Foundation. Because reinforcement learning is an interdisciplinary topic, the workshop brought together researchers from a variety of fields, including machine learning, neural networks, AI, robotics, and operations research. Thirty leading researchers from the United States, Canada, Europe, and Japan, representing from many different universities, government, and industrial research laboratories participated in the workshop. The goals of the meeting were to (1) understand limitations of current reinforcement-learning systems and define promising directions for further research; (2) clarify the relationships between reinforcement learning and existing work in engineering fields, such as operations research; and (3) identify potential industrial applications of reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

AI Magazine

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

1295

AI MagazineJan-4-2018, 13:22:00 GMT

This article describes a milestone in our research efforts toward the real robot competition in RoboCup. We participated in the middle-size league at RoboCup-97, held in conjunction with the Fifteenth International Joint Conference on Artificial Intelligence in Nagoya, Japan. Reinforcement learning has recently been receiving increased attention as a method for robot learning with little or no a priori knowledge and a higher capability for reactive and adaptive behaviors (Connel and Mahadevan 1993). In the reinforcement learning scheme, a robot and an environment are modeled by two synchronized finite-state automatons interacting in discrete-time cyclical processes. The robot senses the current state of the environment and selects an action.

artificial intelligence, reinforcement learning, robot, (16 more...)

AI Magazine

Industry: Information Technology > Robotics & Automation (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback

Solving Multiagent Networks Using Distributed Constraint Optimization

AI MagazineJan-4-2018, 12:30:35 GMT

In many cooperative multiagent domains, the effect of local interactions between agents can be compactly represented as a network structure. Given that agents are spread across such a network, agents directly interact only with a small group of neighbors. A distributed constraint optimization problem (DCOP) is a useful framework to reason about such networks of agents. Given agents' inability to communicate and collaborate in large groups in such networks, we focus on an approach called k-optimality for solving DCOPs. In this approach, agents form groups of one or more agents until no group of k or fewer agents can possibly improve the DCOP solution; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum, as k-optimal.

agent, artificial intelligence, constraint, (16 more...)

AI Magazine

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback