AITopics | hybrid reward architecture

Hybrid Reward Architecture for Reinforcement Learning

Neural Information Processing SystemsMar-17-2026, 12:30:49 GMT

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.98)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Neural Information Processing SystemsDec-23-2025, 18:13:27 GMT

A growing trend for value-based reinforcement learning (RL) algorithms is to capture more information than scalar value functions in the value network. One of the most well-known methods in this branch is distributional RL, which models return distribution instead of scalar value. In another line of work, hybrid reward architectures (HRA) in RL have studied to model source-specific value functions for each source of reward, which is also shown to be beneficial in performance. To fully inherit the benefits of distributional RL and hybrid reward architectures, we introduce Multi-Dimensional Distributional DQN (MD3QN), which extends distributional RL to model the joint return distribution from multiple reward sources. As a by-product of joint distribution modeling, MD3QN can capture not only the randomness in returns for each source of reward, but also the rich reward correlation between the randomness of different sources. We prove the convergence for the joint distributional Bellman operator and build our empirical algorithm by minimizing the Maximum Mean Discrepancy between joint return distribution and its Bellman target. In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions, and outperforms previous RL methods utilizing multi-dimensional reward functions in the control setting.

distributional reinforcement learning, joint return distribution, multi-dimensional reward function, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)

Add feedback

Hybrid Reward Architecture for Reinforcement Learning

Neural Information Processing SystemsNov-21-2025, 14:22:07 GMT

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.

hybrid reward architecture, low-dimensional representation, name change, (4 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.98)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Hybrid Reward Architecture for Reinforcement Learning Harm van Seijen

Neural Information Processing SystemsNov-21-2025, 04:47:54 GMT

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Neural Information Processing SystemsOct-9-2024, 11:55:08 GMT

A growing trend for value-based reinforcement learning (RL) algorithms is to capture more information than scalar value functions in the value network. One of the most well-known methods in this branch is distributional RL, which models return distribution instead of scalar value. In another line of work, hybrid reward architectures (HRA) in RL have studied to model source-specific value functions for each source of reward, which is also shown to be beneficial in performance. To fully inherit the benefits of distributional RL and hybrid reward architectures, we introduce Multi-Dimensional Distributional DQN (MD3QN), which extends distributional RL to model the joint return distribution from multiple reward sources. As a by-product of joint distribution modeling, MD3QN can capture not only the randomness in returns for each source of reward, but also the rich reward correlation between the randomness of different sources.

artificial intelligence, distributional reinforcement learning, machine learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

Reviews: Hybrid Reward Architecture for Reinforcement Learning

Neural Information Processing SystemsOct-7-2024, 13:53:47 GMT

R5: Summary: This paper builds on the basic idea of the Horde architecture: learning many value functions in parallel with off-policy reinforcement learning. This paper shows that learning many value functions in parallel improves the performance on a single main task. The novelty here lies in a particular strategy for generating many different reward functions and how to combine them to generate behavior. The results show large improvements in performance in an illustrative grid world and Miss Pac-man. Decision: This paper is difficult to access.

hybrid reward architecture, representation, value function, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)

Add feedback

Hybrid Reward Architecture for Reinforcement Learning

Seijen, Harm Van, Fatemi, Mehdi, Romoff, Joshua, Laroche, Romain, Barnes, Tavian, Tsang, Jeffrey

Neural Information Processing SystemsFeb-14-2020, 17:26:16 GMT

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function.

hybrid reward architecture, low-dimensional representation, reinforcement learning, (2 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Hybrid Reward Architecture for Reinforcement Learning

Seijen, Harm Van, Fatemi, Mehdi, Romoff, Joshua, Laroche, Romain, Barnes, Tavian, Tsang, Jeffrey

Neural Information Processing SystemsDec-31-2017

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

AI computer gets first ever perfect score on Ms. Pac-Man

Daily Mail - Science & techJun-15-2017, 16:30:10 GMT

While it might sound like an elusive dream for most, the perfect score for arcade classic Ms. Pac-Man has been achieved – albeit by a computer. Researchers have created an artificial intelligence-based system that learned how to get the maximum score of 999,990 on the addictive 1980s video game. And the innovative method used could help to make advances in other areas of AI research, such as natural language processing. Researchers have created an artificial intelligence-based system that learned how to get the maximum score of 999,990 on the addictive 1980s video game, Ms. Pac-Man The technique, which the team has named'Hybrid Reward Architecture', used 150 agents, which worked in parallel with one another. For example, some agents were rewarded for successfully finding one specific pellet, while others were tasked with staying out of the way of ghosts.

agent, artificial intelligence, natural language, (12 more...)

Daily Mail - Science & tech

Country: North America > Canada > Quebec > Montreal (0.16)

Genre: Research Report (0.36)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: