AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Neural Information Processing SystemsAug-16-2025, 00:45:56 GMT

Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL). Existing approaches such as potential-based reward shaping normally make full use of a given shaping reward function. However, since the transformation of human knowledge into numeric reward values is often imperfect due to reasons such as human cognitive bias, completely utilizing the shaping reward function may fail to improve the performance of RL algorithms. In this paper, we consider the problem of adaptively utilizing a given shaping reward function. We formulate the utilization of shaping rewards as a bi-level optimization problem, where the lower level is to optimize policy using the shaping rewards and the upper level is to optimize a parameterized shaping weight function for true reward maximization. We formally derive the gradient of the expected true reward with respect to the shaping weight function parameters and accordingly propose three learning algorithms based on different assumptions. Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards, and meanwhile ignore unbeneficial shaping rewards or even transform them into beneficial ones.

algorithm, proceedings, reward function, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
North America > Canada (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Add feedback

A Ablations

Neural Information Processing SystemsAug-15-2025, 23:29:18 GMT

We find that past play greatly stabilizes the emergence of reciprocity in IPD. In cells containing another agent, we include the RUSP observations in these channels. In Figure 11 we show results when training with RUSP in these environments. Consistent with past work, the greedy baseline fails to reach a solution with high collective return. We use a distributed computing infrastructure used in Berner et al.

action head, agent, prisoner, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences

Neural Information Processing SystemsAug-15-2025, 23:29:11 GMT

Multi-agent reinforcement learning (MARL) has shown recent success in increasingly complex fixed-team zero-sum environments.

agent, prisoner, social dilemma, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)

Add feedback

b628386c9b92481fab68fbf284bd6a64-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 23:28:43 GMT

agent, coordination, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Neural Information Processing SystemsAug-15-2025, 23:27:51 GMT

However, the problem of transferring skills in a risk-aware manner is not well-understood.

policy evaluation, reinforcement learning, successor feature, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

DeepFoids: Adaptive Bio-Inspired Fish Simulation with Deep Reinforcement Learning

Neural Information Processing SystemsAug-15-2025, 23:27:44 GMT

Our goal is to synthesize realistic underwater scenes with various fish species in different fish cages, which can be utilized to train computer vision models to automate fish counting task.

machine learning, reinforcement learning, simulation, (18 more...)

Neural Information Processing Systems

Country: