AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Near-optimal Reinforcement Learning in Factored MDPs

Ian Osband, Benjamin Van Roy

Neural Information Processing SystemsOct-2-2025, 17:47:54 GMT

Neural Information Processing Systems http://nips.cc/

factored mdp, near-optimal reinforcement learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning

Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin, Changyou Chen

Neural Information Processing SystemsOct-2-2025, 17:45:50 GMT

However, recommendations can easily violate preferences of users from their past natural-language feedback, since the recommender needs to explore new items for further improvement.

machine learning, natural language, reinforcement learning, (17 more...)

Neural Information Processing Systems

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Policy Improvement via Imitation of Multiple Oracles

Neural Information Processing SystemsOct-2-2025, 17:37:15 GMT

Despite its promise, reinforcement learning's real-world adoption has been hampered by the need for costly exploration to learn a good policy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)

Add feedback

The V alue-Equivalence Principle for Model-Based Reinforcement Learning Supplementary Material

Neural Information Processing SystemsOct-2-2025, 17:27:25 GMT

In this supplement we give details of our theoretical results and experiments that had to be left out of the main paper due to space constraints. Section A.1.1 contains derivations of the properties and propositions presented in the main Section A.2 provides a detailed outline of the pipeline used across our experiments in the The numbering of equations, figures and citations resume from what is used in the main paper. This result directly follows from Definitions 1 and 2.Property 2. M( null, V) either contains m We will show the result by contradiction. In order to prove Proposition 2 we will need four lemmas which we state and prove below. It follows that H - dim[B ] = nm rank(A) rank(C).

experiment, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback

The Value Equivalence Principle for Model-Based Reinforcement Learning

Neural Information Processing SystemsOct-2-2025, 17:27:17 GMT

Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko

Neural Information Processing SystemsOct-2-2025, 17:25:51 GMT

Planning with a generative model is thinking before acting .

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.42)

Add feedback

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsOct-2-2025, 17:23:15 GMT

These algorithms are typically employed in settings where the number of players and the type of interaction (competitive, cooperative, etc.) are both

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre:

Overview (0.67)
Research Report (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

3b2acfe2e38102074656ed938abf4ac3-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 17:23:08 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre:

Research Report (0.47)
Overview (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Tengyang Xie, Yifei Ma, Yu-Xiang Wang

Neural Information Processing SystemsOct-2-2025, 17:21:14 GMT

Solving OPE is often the starting point in many RL applications. To tackle the problem of OPE, the idea of importance sampling (IS) corrects the mismatch in the distributions under the behavior policy and target policy.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Budgeted Reinforcement Learning in Continuous State Space

Neural Information Processing SystemsOct-2-2025, 17:17:53 GMT

So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Europe > France (0.28)

Industry: Automobiles & Trucks (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback