AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

c71769e2715835d37c3e25cc1173bd62-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 16:22:18 GMT

abstraction model, agent, international conference, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Chile (0.04)
Europe > Italy (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Transportation > Ground > Road (0.47)
Information Technology (0.46)
Government (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Add feedback

Diffusion Spectral Representation for Reinforcement Learning

Neural Information Processing SystemsOct-10-2025, 16:14:45 GMT

Diffusion-based models have achieved notable empirical successes in reinforcement learning (RL) due to their expressiveness in modeling complex distributions. Despite existing methods being promising, the key challenge of extending existing methods for broader real-world applications lies in the computational cost at inference time, i.e., sampling from a diffusion model is considerably slow as it often requires tens to hundreds of iterations to generate even one sample. To circumvent this issue, we propose to leverage the flexibility of diffusion models for RL from a representation learning perspective.

algorithm, diffusion model, representation, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Austria (0.04)
(8 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

Add feedback

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

Neural Information Processing SystemsOct-10-2025, 16:12:51 GMT

With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability.

consistency policy, dormant ratio, international conference, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
Africa > Rwanda > Kigali > Kigali (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(8 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning Weikang Wan

Neural Information Processing SystemsOct-10-2025, 16:06:17 GMT

This paper introduces DiffTORI, which utilizes Diff erentiable T rajectory O ptimization as the policy representation to generate actions for deep R einforcement and I mitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function.

difftori, optimization, trajectory optimization, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Rule Based Rewards for Language Model Safety

Neural Information Processing SystemsOct-10-2025, 16:03:57 GMT

We propose a novel preference modeling approach that utilizes AI feedback and only requires a small amount of human data.

completion, proposition, rbr, (15 more...)

Neural Information Processing Systems

Country: Europe > France (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

c4bb407aa5042727317f9856dace001c-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 15:57:14 GMT

activation, algorithm, learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
Africa > Mali (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(4 more...)

Add feedback

c3fa3a7d50b34732c6d08f6f66380d75-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 15:55:22 GMT

algorithm, probability, representation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Vision-Language Navigation with Energy-Based Policy

Neural Information Processing SystemsOct-10-2025, 15:55:07 GMT

Existing VLN models are optimized through expert demonstrations by supervised behavioural cloning or incorporating manual reward engineering.

navigation, vision-and-language navigation, wang, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
(4 more...)

Add feedback

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL Qin-Wen Luo

Neural Information Processing SystemsOct-10-2025, 15:54:56 GMT

Offline reinforcement learning (RL) aims to learn a policy from a fixed dataset without additional interactions with the environment.

fine-tuning, offline policy, online fine-tuning, (13 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Montana (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Education > Educational Setting > Online (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Pessimistic Backward Policy for GFlowNets

Neural Information Processing SystemsOct-10-2025, 15:39:13 GMT

To this end, GFlowNets construct an object through a trajectory of state transitions, e.g., iteratively adding molecular fragments to construct a molecule.

backward policy, boltzmann distribution, trajectory, (15 more...)

Neural Information Processing Systems

Country: