AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Entropic Desired Dynamics for Intrinsic Control: Supplemental Material Steven Hansen

Neural Information Processing SystemsFeb-8-2026, 22:46:44 GMT

While this is not close to the state-of-the-art in general (c.f. Figure 2 shows the effect of action entropy on exploratory behavior in Montezuma's Revenge. Number of unique avatar positions visited. Full training curves across all 6 Atari games are shown in Figure 1, including the random policy baseline. To ensure this didn't hamper performance, we At each state visited by the agent evaluator during training, the agent's state (consisting of the avatar's The full curves are included for completeness. The compute cluster we performed experiments on is heterogenous, and has features such as host-sharing, adaptive load-balancing, etc.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)

Add feedback

5f7f02b7e4ade23430f345f954c938c1-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 22:46:40 GMT

eddict, learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry:

Leisure & Entertainment > Games (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

73a427badebe0e32caa2e1fc7530b7f3-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 22:46:00 GMT

joint action, qmix, weighting, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

WeightedQMIX: ExpandingMonotonicValue FunctionFactorisationforDeepMulti-Agent ReinforcementLearning

Neural Information Processing SystemsFeb-8-2026, 22:39:27 GMT

In this paradigm of centralised training for decentralised execution, QMIX [25] is a popular Qlearning algorithm with state-of-the-art performance ontheStarCraft Multi-Agent Challenge [26]. QMIX represents the optimal joint action value function using a monotonicmixing function of per-agent utilities.

joint action, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

73634c1dcbe056c1f7dcf5969da406c8-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 22:36:39 GMT

algorithm, arxiv preprint arxiv, task parameter, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas (0.04)
(4 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AnEfficientAsynchronousMethodforIntegrating EvolutionaryandGradient-basedPolicySearch

Neural Information Processing SystemsFeb-8-2026, 22:27:37 GMT

These have the opposite properties, with DRL having good sample efficiencyandpoor stability, while ESbeing vice versa. Recently,there havebeen attempts tocombine these algorithms, butthesemethods fullyrelyonsynchronous updatescheme, making it not ideal to maximize the benefits of the parallelism in ES.

evolutionary algorithm, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

4fe1859112230a032c7143a9adc3be78-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 22:14:11 GMT

crossover, molecule, reaction, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials (0.92)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

4fe1859112230a032c7143a9adc3be78-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 22:14:07 GMT

algorithm, crossover, molecule, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Appendix: ContinuousDoublyConstrainedBatch ReinforcementLearning

Neural Information Processing SystemsFeb-8-2026, 21:59:52 GMT

However, numbers for BCQ and SAC are from our runs for all tasks. These plots show that, in the vast majority of environments, CDC exhibits consistently better performance across different seeds/iterations.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology: