AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

743459dae9b2c5d2904e5432d5298128-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 20:12:52 GMT

algorithm, information, pomg, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Add feedback

8ce8b102d40392a688f8c04b3cd6cae0-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 20:00:12 GMT

algorithm, blueprint, rl search, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(5 more...)

Genre:

Instructional Material (0.46)
Research Report (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
(2 more...)

Add feedback

WhySoPessimistic? EstimatingUncertaintiesforOffline RLthroughEnsembles,and WhyTheirIndependenceMatters

Neural Information Processing SystemsFeb-9-2026, 20:00:00 GMT

Through theoretical analyses andconstruction ofexamples intoyMDPs,wedemonstrate thatshared pessimistic targets can paradoxically lead to value estimates that are effectively optimistic.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning to Incentivize Other Learning Agents

Neural Information Processing SystemsFeb-9-2026, 19:58:46 GMT

Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

Reviewer 1

Neural Information Processing SystemsFeb-9-2026, 19:58:34 GMT

We appreciate R1's recognition of the novelty of our contribution to MARL and the potential impact on a We address R1's two concerns below. "give-reward" actions are direct applications of conventional RL (which have been applied to multi-agent incentivization We appreciate R2's positive feedback on our quantitative results and we are glad that our behavioral Figure 6b where the agent gives nonzero reward for "fire cleaning beam but miss" after 40k steps, one reason is that the Figure 6a), so it may have "forgotten" the difference between successful and unsuccessful usage of the cleaning beam. As demonstrated more clearly in the Escape Room results (e.g. We thank R3 for recognizing our contribution to the general class of opponent-shaping algorithms. Prisoner's Dilemma is fully observable).

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Technology: