AITopics | reward decomposition

Collaborating Authors

reward decomposition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distributional Reward Decomposition for Reinforcement Learning

Zichuan Lin, Li Zhao, Derek Yang, Tao Qin, Tie-Yan Liu, Guangwen Yang

Neural Information Processing SystemsFeb-13-2026, 01:16:51 GMT

Van Seijen et al. [2017] propose to split a state into different sub-states, each with a sub-reward obtained bytraining ageneral valuefunction, andlearnmultiple valuefunctions withsub-rewards. The architecture is rather limited due to requiring prior knowledge of how to split into sub-states.

machine learning, reinforcement, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Shandong Province > Qingdao (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

82039d16dce0aab3913b6a7ac73deff7-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 04:15:40 GMT

div 2, rd 2, sample training experience, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

RD 2 : Reward Decomposition with Representation Decomposition

Neural Information Processing SystemsDec-24-2025, 05:53:25 GMT

Reward decomposition, which aims to decompose the full reward into multiple sub-rewards, has been proven beneficial for improving sample efficiency in reinforcement learning. Existing works on discovering reward decomposition are mostly policy dependent, which constrains diverse or disentangled behavior between different policies induced by different sub-rewards. In this work, we propose a set of novel reward decomposition principles by constraining uniqueness and compactness of different state features/representations relevant to different sub-rewards. Our principles encourage sub-rewards with minimal relevant features, while maintaining the uniqueness of each sub-reward. We derive a deep learning algorithm based on our principle, and term our method as RD$^2$, since we learn reward decomposition and representation decomposition jointly. RD$^2$ is evaluated on a toy case, where we have the true reward structure, and some Atari environments where reward structure exists but is unknown to the agent to demonstrate the effectiveness of RD$^2$ against existing reward decomposition methods.

decomposition, representation decomposition, reward decomposition, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

TritonRL: Training LLMs to Think and Code Triton Without Cheating

Woo, Jiin, Zhu, Shaowei, Nie, Allen, Jia, Zhen, Wang, Yida, Park, Youngsuk

arXiv.org Artificial IntelligenceOct-22-2025

With the rapid evolution of large language models (LLMs), the demand for automated, high-performance system kernels has emerged as a key enabler for accelerating development and deployment. We introduce TritonRL, a domain-specialized LLM for Triton kernel generation, trained with a novel training framework that enables robust and automated kernel synthesis. Unlike general-purpose programming languages, Triton kernel generation faces unique challenges due to data scarcity and incomplete evaluation criteria, vulnerable to reward hacking. Our approach addresses these challenges end-to-end by distilling Triton-specific knowledge through supervised fine-tuning on curated datasets, and further improving code quality via reinforcement learning (RL) with robust, verifiable rewards and hierarchical reward assignment. Our RL framework robustly detects reward hacking and guides both reasoning traces and code tokens through fine-grained verification and hierarchical reward decomposition, enabling the model to generate high-quality Triton kernels that can truly replace existing modules. With robust and fine-grained evaluation, our experiments on KernelBench demonstrate that TritonRL achieves state-of-the-art correctness and speedup, surpassing all other Triton-specific models and underscoring the effectiveness of our RL-based training paradigm.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.17891

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

82039d16dce0aab3913b6a7ac73deff7-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 10:01:07 GMT

q-learning, rd 2, sample training experience, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

82039d16dce0aab3913b6a7ac73deff7-Paper.pdf

Neural Information Processing SystemsOct-3-2025, 10:01:02 GMT

decomposition, representation, reward decomposition, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada (0.04)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

Distributional Reward Decomposition for Reinforcement Learning

Zichuan Lin, Li Zhao, Derek Yang, Tao Qin, Tie-Yan Liu, Guangwen Yang

Neural Information Processing SystemsOct-3-2025, 06:38:18 GMT

Neural Information Processing Systems http://nips.cc/

architecture, decomposition, reward decomposition, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Shandong Province > Qingdao (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Aligning Dialogue Agents with Global Feedback via Large Language Model Reward Decomposition

Lee, Dong Won, Park, Hae Won, Breazeal, Cynthia, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceMay-23-2025

We propose a large language model based reward decomposition framework for aligning dialogue agents using only a single session-level feedback signal. We leverage the reasoning capabilities of a frozen, pretrained large language model (LLM) to infer fine-grained local implicit rewards by decomposing global, session-level feedback. Our first text-only variant prompts the LLM to perform reward decomposition using only the dialogue transcript. The second multimodal variant incorporates additional behavioral cues, such as pitch, gaze, and facial affect, expressed as natural language descriptions. These inferred turn-level rewards are distilled into a lightweight reward model, which we utilize for RL-based fine-tuning for dialogue generation. We evaluate both text-only and multimodal variants against state-of-the-art reward decomposition methods and demonstrate notable improvements in human evaluations of conversation quality, suggesting that LLMs are strong reward decomposers that obviate the need for manual reward shaping and granular human feedback.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.15922

Country:

Asia (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (0.93)
Personal > Interview (0.68)

Industry:

Leisure & Entertainment (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Strategy Masking: A Method for Guardrails in Value-based Reinforcement Learning Agents

Keane, Jonathan, Keyser, Sam, Kedziora, Jeremy

arXiv.org Artificial IntelligenceJan-9-2025

The use of reward functions to structure AI learning and decision making is core to the current reinforcement learning paradigm; however, without careful design of reward functions, agents can learn to solve problems in ways that may be considered ``undesirable" or ``unethical. Without thorough understanding of the incentives a reward function creates, it can be difficult to impose principled yet general control mechanisms over its behavior. In this paper, we study methods for constructing guardrails for AI agents that use reward functions to learn decision making. We introduce a novel approach, which we call strategy masking, to explicitly learn and then suppress undesirable AI agent behavior. We apply our method to study lying in AI agents and show that strategy masking can effectively modify agent behavior by suppressing, or actively penalizing, the reward dimension for lying such that agents act more honestly while not compromising their ability to perform effectively.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2501.05501

Country: North America > United States (0.29)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback