AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Natural Language Instruction-following with Task-related Language Development and Translation

Neural Information Processing SystemsFeb-8-2026, 15:45:23 GMT

Natural language-conditioned reinforcement learning (RL) enables agents to follow human instructions.

large language model, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Colorado > Denver County > Denver (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
(2 more...)

Add feedback

5011bf6d8a37692913fce3a15a51f070-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 15:37:22 GMT

arxiv preprint arxiv, exploration, international conference, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Learning Universal Policies via Text-Guided Video Generation

Neural Information Processing SystemsFeb-8-2026, 15:24:43 GMT

Such diversity hampers knowledge sharing, learning, and generalization across tasks and environments.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Asia > Vietnam > Hanoi > Hanoi (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

6101903146e4bbf4999c449d78441606-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 15:24:29 GMT

algorithm, ensemble, trajectory, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Montana (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

444d69470b24ded080183c907b711bbf-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 15:17:24 GMT

constraint, inequality, probability, (14 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

444d69470b24ded080183c907b711bbf-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 15:17:21 GMT

algorithm, constraint violation, sample complexity, (10 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

60cb558c40e4f18479664069d9642d5a-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 15:16:10 GMT

In real-world decision-making tasks, learning an optimal policy without a trialand-error process is an appealing challenge. When expert demonstrations are available, imitation learning that mimics expert actions can learn a good policy efficiently.

expert demonstration, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

AnExponentialLowerBoundforLinearly-Realizable MDPswithConstantSuboptimalityGap

Neural Information Processing SystemsFeb-8-2026, 15:06:49 GMT

A fundamental question in the theory of reinforcement learning is: suppose the optimalQ-function lies inthe linear span ofagivenddimensional feature mapping, is sample-efficient reinforcement learning (RL) possible? The recent and remarkable result of Weisz et al. (2020) resolves this question in the negative, providinganexponential(ind)samplesizelowerbound,whichholdsevenifthe agent has access to a generative model of the environment. One may hope that such a lower can be circumvented with an even stronger assumption that there isaconstant gapbetween the optimalQ-value ofthe best action and that ofthe second-best action (for allstates); indeed, the construction inWeisz etal.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: