AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Reflexion: Language Agents with Verbal Reinforcement Learning

Neural Information Processing SystemsFeb-8-2026, 12:56:25 GMT

Large language models (LLMs) have been increasingly used to interact with external environments (e.g., games, compilers, APIs) as goal-driven agents.

large language model, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre:

Personal (0.48)
Research Report (0.46)

Industry:

Media (0.68)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Appendix

Neural Information Processing SystemsFeb-8-2026, 12:35:10 GMT

Inthis section, we provide additional discussions of applying decision-focused learning toMDPs problems. Specifically, the assumption on smooth policy is similar to the idea of soft Q-learning [12] and soft actor-critic [13]proposed by Haarnoja et al. The randomly initiated neural network uses ReLU layers asnonlinearity followed byalinear layer intheend. Training parameters Across all three examples, we consider the discounted setting where the discount factor isγ = 0.95. Torelax the optimal policygivenbythe RL solver,we relax the Bellman equation used to run value-iteration by relaxing all the argmax and max operators in theBellman equation tosoftmax with temperature0.1,i.e., weuseSOFTMAX(0.1 Q-values)to replace all the argmax over Q values.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

LearningMDPsfromFeatures: Predict-Then-OptimizeforSequentialDecision ProblemsbyReinforcementLearning

Neural Information Processing SystemsFeb-8-2026, 12:35:06 GMT

To resolve the first challenge, we propose to sample anestimate ofthefirst-order andsecond-order derivativestoapproximate theoptimality andKKT conditions.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Ohio > Franklin County > Columbus (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

170dc3e41f2d03e327e04dbab0fccbfb-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 12:34:48 GMT

interactive data collection, rmdp, robust rl, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Iran (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Data Science (0.93)

Add feedback

57e5cb96e22546001f1d6520ff11d9ba-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 12:25:03 GMT

arxiv preprint arxiv, learning, trajectory, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)

Industry:

Energy (0.93)
Transportation (0.68)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Hao Bai 1,2 Yifei Zhou

Neural Information Processing SystemsFeb-8-2026, 12:17:59 GMT

While training with static demonstrations has shown some promise, we show that such methods fall short for controlling real GUIs due to their failure to deal with real world stochasticity and non-stationarity not captured in static observational data.

large language model, machine learning, reinforcement learning, (22 more...)

Neural Information Processing Systems

Country:

South America > Chile (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Services (0.68)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

1700ad4e6252e8f2955909f96367b34d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 12:16:07 GMT

exp null, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.92)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Data Science (0.67)

Add feedback

16f8a0852b31bc9dc791ecf313247a57-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 12:07:11 GMT

algorithm, assumption 4, probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

TheNetHackLearningEnvironment

Neural Information Processing SystemsFeb-8-2026, 12:04:51 GMT

As advocated by [39, 38, 18], procedurally generated environments are a promising direction for testing systematic generalization of RL agents.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.06)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Checklist

Neural Information Processing SystemsFeb-8-2026, 11:56:10 GMT

The checklist follows the references. Please do not modify the questions and only use the provided macros for your answers. Checklist section does not count towards the page limit. Do the main claims made in the abstract and introduction accurately reflect the paper's Did you describe the limitations of your work? Did you discuss any potential negative societal impacts of your work?

machine learning, reinforcement learning, transition, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback