AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

43207fd5e34f87c48d584fc5c11befb8-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 18:48:37 GMT

It is well believed that model-based RL, where the agent learns the model of the environment and then performs planning in the model, is significantly more sample efficient than model-free RL. Recent empirical advances also justify such a belief (e.g.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Weighted importance sampling for off-policy learning with linear function approximation

Neural Information Processing SystemsOct-2-2025, 18:36:53 GMT

Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, \emph{weighted} importance sampling, does not carry over easily to function approximation and, because of this, it is not utilized in existing off-policy learning algorithms. In this paper, we take two steps toward bridging this gap. First, we show that weighted importance sampling can be viewed as a special case of weighting the error of individual training samples, and that this weighting has theoretical and empirical benefits similar to those of weighted importance sampling. Second, we show that these benefits extend to a new weighted-importance-sampling version of off-policy LSTD(lambda). We show empirically that our new WIS-LSTD(lambda) algorithm can result in much more rapid and reliable convergence than conventional off-policy LSTD(lambda) (Yu 2010, Bertsekas & Yu 2009).

function approximation, linear function approximation, off-policy learning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Non-Cooperative Inverse Reinforcement Learning

Neural Information Processing SystemsOct-2-2025, 18:32:24 GMT

Making decisions in the presence of a strategic opponent requires one to take into account the opponent's ability to actively mask its intended objective. To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism. The N-CIRL formalism consists of two agents with completely misaligned objectives, where only one of the agents knows the true objective function.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Industry:

Government > Military (0.93)
Information Technology > Security & Privacy (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Real-Time Reinforcement Learning

Simon Ramstedt, Chris Pal

Neural Information Processing SystemsOct-2-2025, 18:16:25 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

Kaiqing Zhang, Zhuoran Yang, Tamer Basar

Neural Information Processing SystemsOct-2-2025, 18:12:06 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada (0.04)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Game Theory (0.94)
(2 more...)

Add feedback

RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning

Marek Petrik, Dharmashankar Subramanian

Neural Information Processing SystemsOct-2-2025, 17:57:08 GMT

Neural Information Processing Systems http://nips.cc/

approximating aggregated mdp, reinforcement learning, robustness, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs

Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin

Neural Information Processing SystemsOct-2-2025, 17:53:56 GMT

Neural Information Processing Systems http://nips.cc/

machine learning, reinforcement learning, trajectory, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

Learning Reward Machines for Partially Observable Reinforcement Learning

Neural Information Processing SystemsOct-2-2025, 17:53:20 GMT

The use of neural networks for function approximation has led to many recent advances in Reinforcement Learning (RL) .

agent, cookie, reward machine, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.15)
South America > Chile (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 17:52:57 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper presents a new technique for solving MDPs. The new technique, presented as an alternative to approximate policy/value iteration, consists in directly minimizing the Optimal Bellman Residual (OBR). The authors first motivate their method by showing that the loss bound of OBR is often tighter than the loss bound of policy/value iteration, which is a known result [9,15]. The authors then show that an empirical estimate of OBR is consistent in the Vapnick sense, i.e. minimizing the empirical OBR is equivalent to minimizing an upper bound on the true OBR, which is unknown when the MDP model is unknown. Finally, the authors show that OBR can be decomposed into a difference of two convex functions, and a standard Difference of Convex Functions (DC) optimization method can be used for finding a local optimum.

contribution, convex function, decomposition, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Research Report > New Finding (0.69)

Technology: