AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

AutomaticDataAugmentationforGeneralizationin ReinforcementLearning

Neural Information Processing SystemsFeb-8-2026, 00:34:01 GMT

Generalization to new environments remains a major challenge in deep reinforcement learning (RL). Current methods fail to generalize to unseen environments even when trained on similar settings [19, 51, 71, 11, 21, 12, 60].

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

2b0aa0d9e30ea3a55fc271ced8364536-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 00:24:37 GMT

arxiv preprint arxiv, demonstration, estimation, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

284afdc2309f9667d2d4fb9290235b0c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 00:14:53 GMT

Theseoutcome-conditioned imitationlearningmethodsare appealing because of their simplicity, strong performance, and close ties with supervisedlearning.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

UnifyingGradientEstimatorsforMeta-Reinforcement LearningviaOff-PolicyEvaluation

Neural Information Processing SystemsFeb-8-2026, 00:14:24 GMT

This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

2a38a4a9316c49e5a833517c45d31070-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 00:12:40 GMT

learning, representation, trajectory, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Supplementary Material for " Variational Policy Gradient Method for Reinforcement Learning with General Utilities " A Related Work

Neural Information Processing SystemsFeb-8-2026, 00:04:53 GMT

We provide a more extension discussion for the context of this work. Firstly, when closed-form expressions for the optimizer of a function are unavailable, solving optimization problems requires iterative schemes such as gradient ascent [31]. Their convergence to global extrema is predicated on concavity and the tractability of computing ascent directions. When the objective takes the form of an expected value of a function parameterized by a random variable, stochastic approximations are required [36, 24]. The PG Theorem mentioned above gives a specific form for obtaining ascent directions with respect to a parameterized family of stationary policies via trajectories in a Markov decision process, when the objective is the expected cumulative return [44], which gives rise to the REINFORCE algorithm.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Neural Information Processing SystemsFeb-8-2026, 00:04:46 GMT

In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Maryland > Prince George's County > Adelphi (0.04)
(2 more...)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

30de9ece7cf3790c8c39ccff1a044209-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 00:03:35 GMT

One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s). Existing methods tend to rely on inferring the human's goal, which is challenging when there are many potential goals or when the set of candidate goals is difficult to identify. We propose a new paradigm for assistance by instead increasing thehuman's ability tocontroltheir environment, and formalize this approach byaugmenting reinforcement learning withhuman empowerment.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: