AITopics | discor

Deep reinforcement learning can learn effective policies for a wide range of tasks, but is notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons for this remain unclear. In this paper, we study how RL methods based on bootstrapping-based Q-learning can suffer from a pathological interaction between function approximation and the data distribution used to train the Q-function: with standard supervised learning, online data collection should induce corrective feedback, where new data corrects mistakes in old predictions. With dynamic programming methods like Q-learning, such feedback may be absent. This can lead to potential instability, sub-optimal convergence, and poor results when learning from noisy, sparse or delayed rewards. Based on these observations, we propose a new algorithm, DisCor, which explicitly optimizes for data distributions that can correct for accumulated errors in the value function. DisCor computes a tractable approximation to the distribution that optimally induces corrective feedback, which we show results in reweighting samples based on the estimated accuracy of their target values.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

d7f426ccbc6db7e235c57958c21c5dfa-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 16:45:34 GMT

algorithm, discor, q-function, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

d7f426ccbc6db7e235c57958c21c5dfa-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 16:45:26 GMT

algorithm, discor, function approximation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

931af583573227f0220bc568c65ce104-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 01:32:22 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Sweden (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.68)

Add feedback

Review for NeurIPS paper: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Neural Information Processing SystemsFeb-6-2025, 20:44:52 GMT

The paper is very theoretically-grounded, with plenty of explanation of intuition and proof of the approximations used. The significance of the contribution is large. Most RL algorithms are exactly the ADP family that this proposes to modify, and the addition of this corrective feedback model can be slotted into most training loops without compatibility issues. As the authors note, it could also be used to guide exploration rather than just for post hoc transition correction. This is clearly relevant to the NeurIPS community, much of which makes use of this form of RL algorithm.

corrective feedback, distribution correction, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

Review for NeurIPS paper: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Neural Information Processing SystemsFeb-6-2025, 20:44:45 GMT

The reviewers appreciated the rebuttal that provided some additional insights.

corrective feedback, distribution correction, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Neural Information Processing SystemsOct-11-2024, 12:17:06 GMT

Deep reinforcement learning can learn effective policies for a wide range of tasks, but is notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons for this remain unclear. In this paper, we study how RL methods based on bootstrapping-based Q-learning can suffer from a pathological interaction between function approximation and the data distribution used to train the Q-function: with standard supervised learning, online data collection should induce corrective feedback, where new data corrects mistakes in old predictions. With dynamic programming methods like Q-learning, such feedback may be absent. This can lead to potential instability, sub-optimal convergence, and poor results when learning from noisy, sparse or delayed rewards.

corrective feedback, distribution correction, reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

discor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Appendices

d7f426ccbc6db7e235c57958c21c5dfa-Paper.pdf

931af583573227f0220bc568c65ce104-Paper.pdf

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

d7f426ccbc6db7e235c57958c21c5dfa-Supplemental.pdf

d7f426ccbc6db7e235c57958c21c5dfa-Paper.pdf

931af583573227f0220bc568c65ce104-Paper.pdf

Review for NeurIPS paper: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Review for NeurIPS paper: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction