AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

SHAQ: IncorporatingShapleyValueTheoryinto Multi-AgentQ-Learning

Neural Information Processing SystemsFeb-7-2026, 23:54:18 GMT

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however, its underlying mechanism is not yet fully understood.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > France (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

SHAQ: Incorporating Shapley Value Theoryinto Multi-Agent Q-Learning

Neural Information Processing SystemsFeb-7-2026, 23:54:14 GMT

machine learning, reinforcement learning, shaq, (12 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.83)

Add feedback

ActiveExplorationfor InverseReinforcementLearning

Neural Information Processing SystemsFeb-7-2026, 23:35:19 GMT

Instead of using an explicit reward function, Inverse Reinforcement Learning (IRL; Ng et al., 2000) seeks to recover the reward by observing anexpert,e.g.,anhuman whoalready knowshowtoperform atask. However,most existing IRL algorithms assume that the transition model, and in some cases, the expert's policy, areknown.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

10cb15f4559b3d578b7f24966d48a137-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 23:34:38 GMT

algorithm, diversity, international conference, (11 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
North America > United States > Virginia (0.04)
North America > United States > Maryland > Baltimore (0.04)
(8 more...)

Industry:

Leisure & Entertainment > Games (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

266c0f191b04cbbbe529016d0edc847e-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 23:34:00 GMT

agent, reinforcement learning, reward function, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Exploration-Guided RewardShaping forReinforcementLearningunderSparseRewards

Neural Information Processing SystemsFeb-7-2026, 23:33:56 GMT

We study the problem of reward shaping to accelerate the training process of a reinforcement learning agent.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

2f10c1578a0706e06b6d7db6f0b4a6af-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 23:33:10 GMT

information, robot, trajectory, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ImprovingSampleComplexityBoundsfor(Natural) Actor-CriticAlgorithms

Neural Information Processing SystemsFeb-7-2026, 23:15:29 GMT

The goal of reinforcement learning (RL) [39] is to maximize the expected total reward by taking actions according toapolicyinastochastic environment, whichismodelled asaMarkovdecision process (MDP) [4]. To obtain an optimal policy, one popular method is the direct maximization of the expected total reward via gradient ascent, which is referred to as the policy gradient (PG) method [40,47].

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: