AITopics | return decomposition

Collaborating Authors

return decomposition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RUDDER: Return Decomposition for Delayed Rewards

anonymous

Neural Information Processing SystemsFeb-11-2026, 13:56:14 GMT

reinforcement learning; delayed reward; reward redistribution; return decomposition; bias-variance; credit assignment; LSTM

infinitesimal change, reward redistribution, rudder, (13 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach

Neural Information Processing SystemsFeb-10-2026, 22:07:03 GMT

A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed.

dimension, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RUDDER: Return Decomposition for Delayed Rewards

Neural Information Processing SystemsDec-25-2025, 01:42:32 GMT

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards. The latter are related to bias problems in temporal difference (TD) learning and to high variance problems in Monte Carlo (MC) learning. Both problems are even more severe when rewards are delayed. RUDDER aims at making the expected future rewards zero, which simplifies Q-value estimation to computing the mean of the immediate reward. We propose the following two new concepts to push the expected future rewards toward zero.

name change, return decomposition, rudder, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

402e12102d6ec3ea3df40ce1b23d423a-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 13:17:17 GMT

causal structure, dimension, markovian reward, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Add feedback

RUDDER: Return Decomposition for Delayed Rewards

anonymous

Neural Information Processing SystemsOct-2-2025, 05:30:42 GMT

reinforcement learning; delayed reward; reward redistribution; return decomposition; bias-variance; credit assignment; LSTM

artificial intelligence, machine learning, reward redistribution, (16 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Reviews: RUDDER: Return Decomposition for Delayed Rewards

Neural Information Processing SystemsJan-21-2025, 22:34:08 GMT

The reward redistribution method is proven to preserve optimal policies and reduce the expected future reward to zero. This is achieved by redistributing the delayed rewards to the salient state-action events (where saliency is determined by contribution analysis methods). Extensive experiments in both toy domains, as well as the suite of Atari games, demonstrate the method's improvements for delayed reward tasks, as well as the shortcomings of MC and TD methods for these types of tasks. Comments: I felt the work presented in the paper is outstanding. There are numerous contributions that could conceivably stand on their own (resulting in an extremely large appendix!).

experiment, return decomposition, rudder, (8 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)

Add feedback

RUDDER: Return Decomposition for Delayed Rewards

Neural Information Processing SystemsOct-9-2024, 14:25:53 GMT

delayed reward, return decomposition, rudder, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

Chen, Sirui, Zhang, Zhaowei, Yang, Yaodong, Du, Yali

arXiv.org Artificial IntelligenceJan-4-2024

Centralized Training with Decentralized Execution (CTDE) has been proven to be an effective paradigm in cooperative multi-agent reinforcement learning (MARL). One of the major challenges is credit assignment, which aims to credit agents by their contributions. While prior studies have shown great success, their methods typically fail to work in episodic reinforcement learning scenarios where global rewards are revealed only at the end of the episode. They lack the functionality to model complicated relations of the delayed global reward in the temporal dimension and suffer from inefficiencies. To tackle this, we introduce Spatial-Temporal Attention with Shapley (STAS), a novel method that learns credit assignment in both temporal and spatial dimensions. It first decomposes the global return back to each time step, then utilizes the Shapley Value to redistribute the individual payoff from the decomposed global reward. To mitigate the computational complexity of the Shapley Value, we introduce an approximation of marginal contribution and utilize Monte Carlo sampling to estimate it. We evaluate our method on an Alice & Bob example and MPE environments across different scenarios. Our results demonstrate that our method effectively assigns spatial-temporal credit, outperforming all state-of-the-art baselines.

agent, contribution, shapley value, (12 more...)

arXiv.org Artificial Intelligence

2304.0752

Country: Asia > China (0.04)

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

Add feedback

Episodic Return Decomposition by Difference of Implicitly Assigned Sub-Trajectory Reward

Lin, Haoxin, Wu, Hongqiu, Zhang, Jiaji, Sun, Yihao, Ye, Junyin, Yu, Yang

arXiv.org Artificial IntelligenceDec-17-2023

Real-world decision-making problems are usually accompanied by delayed rewards, which affects the sample efficiency of Reinforcement Learning, especially in the extremely delayed case where the only feedback is the episodic reward obtained at the end of an episode. Episodic return decomposition is a promising way to deal with the episodic-reward setting. Several corresponding algorithms have shown remarkable effectiveness of the learned step-wise proxy rewards from return decomposition. However, these existing methods lack either attribution or representation capacity, leading to inefficient decomposition in the case of long-term episodes. In this paper, we propose a novel episodic return decomposition method called Diaster (Difference of implicitly assigned sub-trajectory reward). Diaster decomposes any episodic reward into credits of two divided sub-trajectories at any cut point, and the step-wise proxy rewards come from differences in expectation. We theoretically and empirically verify that the decomposed proxy reward function can guide the policy to be nearly optimal. Experimental results show that our method outperforms previous state-of-the-art methods in terms of both sample efficiency and performance.

diaster, reward function, sub, (12 more...)

arXiv.org Artificial Intelligence

2312.10642

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(17 more...)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach

Zhang, Yudi, Du, Yali, Huang, Biwei, Wang, Ziyan, Wang, Jun, Fang, Meng, Pechenizkiy, Mykola

arXiv.org Artificial IntelligenceNov-10-2023

A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed. Reward redistribution serves as a solution to re-assign credits for each time step from observed sequences. While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable reward redistribution and preserving policy invariance. In this paper, we start by studying the role of causal generative models in reward redistribution by characterizing the generation of Markovian rewards and trajectory-wise long-term return and further propose a framework, called Generative Return Decomposition (GRD), for policy optimization in delayed reward scenarios. Specifically, GRD first identifies the unobservable Markovian rewards and causal relations in the generative process. Then, GRD makes use of the identified causal generative model to form a compact representation to train policy over the most favorable subspace of the state space of the agent. Theoretically, we show that the unobservable Markovian reward function is identifiable, as well as the underlying causal structure and causal models. Experimental results show that our method outperforms state-of-the-art methods and the provided visualization further demonstrates the interpretability of our method. The project page is located at https://reedzyd.github.io/GenerativeReturnDecomposition/.

causal structure, dimension, markovian reward, (12 more...)

arXiv.org Artificial Intelligence

2305.18427

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback