AITopics | rcsl

0a2f65c9d2313b71005e600bd23393fe-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 11:50:41 GMT

machine learning, rcsl, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Adaptive Q -Aid for Conditional Supervised Learning in Offline Reinforcement Learning

Neural Information Processing SystemsMar-21-2026, 19:41:07 GMT

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the highest trajectory returns across diverse offline RL benchmarks. QCS represents a breakthrough in offline RL, pushing the limits of what can be achieved and fostering further innovations.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Genre: Research Report (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

0a2f65c9d2313b71005e600bd23393fe-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 21:54:43 GMT

Note f = , so havethat / f =1.

artificial intelligence, machine learning, rcsl, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.05)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

9e72fc628edeb29f7aa64ac81b7ec6ce-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 00:38:02 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.92)
(3 more...)

Add feedback

When does return-conditioned supervised learning work for offline reinforcement learning?

Neural Information Processing SystemsDec-23-2025, 18:07:28 GMT

Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous study of the capabilities and limitations of RCSL something which is crucially missing in previous work. We find that RCSL returns the optimal policy under a set of assumptions that are stronger than those needed for the more traditional dynamic programming-based algorithms. We provide specific examples of MDPs and datasets that illustrate the necessity of these assumptions and the limits of RCSL. Finally, we present empirical evidence that these limitations will also cause issues in practice by providing illustrative experiments in simple point-mass environments and on datasets from the D4RL benchmark.

name change, offline reinforcement learning, return-conditioned supervised learning work, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning Jeonghye Kim

Neural Information Processing SystemsOct-10-2025, 11:29:45 GMT

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation.

dataset, neural information processing system, rcsl, (12 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

How to Provably Improve Return Conditioned Supervised Learning?

Liu, Zhishuai, Yang, Yu, Wang, Ruhan, Xu, Pan, Zhou, Dongruo

arXiv.org Artificial IntelligenceJun-11-2025

In sequential decision-making problems, Return-Conditioned Supervised Learning (RCSL) has gained increasing recognition for its simplicity and stability in modern decision-making tasks. Unlike traditional offline reinforcement learning (RL) algorithms, RCSL frames policy learning as a supervised learning problem by taking both the state and return as input. This approach eliminates the instability often associated with temporal difference (TD) learning in offline RL. However, RCSL has been criticized for lacking the stitching property, meaning its performance is inherently limited by the quality of the policy used to generate the offline dataset. To address this limitation, we propose a principled and simple framework called Reinforced RCSL. The key innovation of our framework is the introduction of a concept we call the in-distribution optimal return-to-go. This mechanism leverages our policy to identify the best achievable in-dataset future return based on the current state, avoiding the need for complex return augmentation techniques. Our theoretical analysis demonstrates that Reinforced RCSL can consistently outperform the standard RCSL approach. Empirical results further validate our claims, showing significant performance improvements across a range of benchmarks.

machine learning, reinforcement learning, trajectory, (11 more...)

arXiv.org Artificial Intelligence

2506.08463

Genre: Research Report (1.00)

Industry: Health & Medicine (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Q -Aid for Conditional Supervised Learning in Offline Reinforcement Learning

Neural Information Processing SystemsMay-27-2025, 10:42:41 GMT

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce Q -Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of Q -functions. By analyzing Q -function over-generalization, which impairs stable stitching, QCS adaptively integrates Q -aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the highest trajectory returns across diverse offline RL benchmarks. QCS represents a breakthrough in offline RL, pushing the limits of what can be achieved and fostering further innovations.

conditional supervised learning, learning, offline reinforcement learning, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

When does return-conditioned supervised learning work for offline reinforcement learning?

Neural Information Processing SystemsOct-9-2024, 14:02:33 GMT

Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous study of the capabilities and limitations of RCSL something which is crucially missing in previous work. We find that RCSL returns the optimal policy under a set of assumptions that are stronger than those needed for the more traditional dynamic programming-based algorithms.

algorithm, offline reinforcement learning, return-conditioned supervised learning work, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)

Add feedback

Value-Aided Conditional Supervised Learning for Offline RL

Kim, Jeonghye, Lee, Suyoung, Kim, Woojun, Sung, Youngchul

arXiv.org Artificial IntelligenceFeb-2-2024

Offline reinforcement learning (RL) has seen notable advancements through return-conditioned supervised learning (RCSL) and value-based methods, yet each approach comes with its own set of practical challenges. Addressing these, we propose Value-Aided Conditional Supervised Learning (VCS), a method that effectively synergizes the stability of RCSL with the stitching ability of value-based methods. Based on the Neural Tangent Kernel analysis to discern instances where value function may not lead to stable stitching, VCS injects the value aid into the RCSL's loss function dynamically according to the trajectory return. Our empirical studies reveal that VCS not only significantly outperforms both RCSL and value-based methods but also consistently achieves, or often surpasses, the highest trajectory returns across diverse offline RL benchmarks. This breakthrough in VCS paves new paths in offline RL, pushing the limits of what can be achieved and fostering further innovations.

dataset, inverted double pendulum, value-aided conditional supervised learning, (10 more...)

arXiv.org Artificial Intelligence

2402.02017

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

rcsl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

0a2f65c9d2313b71005e600bd23393fe-Paper-Conference.pdf

Adaptive Q -Aid for Conditional Supervised Learning in Offline Reinforcement Learning

0a2f65c9d2313b71005e600bd23393fe-Paper-Conference.pdf

9e72fc628edeb29f7aa64ac81b7ec6ce-Paper-Conference.pdf

When does return-conditioned supervised learning work for offline reinforcement learning?

Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning Jeonghye Kim

How to Provably Improve Return Conditioned Supervised Learning?

Adaptive Q -Aid for Conditional Supervised Learning in Offline Reinforcement Learning

When does return-conditioned supervised learning work for offline reinforcement learning?

Value-Aided Conditional Supervised Learning for Offline RL