AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers

Neural Information Processing SystemsOct-9-2025, 17:03:51 GMT

We show that these methods fail catastrophically when limited to small replay buffers or during incremental learning, where updates only use the most recent sample without batch updates or a replay buffer.

algorithm, gradient, learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Europe > Portugal > Braga > Braga (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Leisure & Entertainment (0.67)
Information Technology (0.45)
Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality

Neural Information Processing SystemsOct-9-2025, 16:57:44 GMT

Occupancy functions play an instrumental role in reinforcement learning (RL) for guiding exploration, handling distribution shift, and optimizing general objectives beyond the expected return. Y et, computationally efficient policy optimization methods that use (only) occupancy functions are virtually non-existent. In this paper, we establish the theoretical foundations of model-free policy gradient (PG) methods that compute the gradient through the occupancy for both online and offline RL, without modeling value functions. Our algorithms reduce gradient estimation to squared-loss regression and are computationally oracle-efficient. We characterize the sample complexities of both local and global convergence, accounting for both finite-sample estimation error and the roles of exploration (online) and data coverage (offline). Occupancy-based PG naturally handles arbitrary offline data distributions, and, with one-line algorithmic changes, can be adapted to optimize any differentiable objective functional.

asm, gradient, lem, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Data Science (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Universally Expressive Communication in Multi-Agent Reinforcement Learning

Neural Information Processing SystemsOct-9-2025, 16:41:05 GMT

By observing that many existing protocols can be viewed as instances of graph neural networks (GNNs), we demonstrate the equivalence of joint action selection to node labelling.

communication, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Approximate Value Equivalence

Neural Information Processing SystemsOct-9-2025, 16:40:49 GMT

This gives rise to a rich collection of topological relationships and conditions under which VE models are optimal for planning. Despite this effort, relatively little is known about the planning performance of models that fail to satisfy these conditions.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Michigan (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Model-based Lifelong Reinforcement Learning with Bayesian Exploration

Neural Information Processing SystemsOct-9-2025, 16:40:32 GMT

We propose to address these problems by extracting the common structure existing in previously encountered tasks so that the agent can quickly learn the dynamics specific to the new tasks.

machine learning, posterior, reinforcement learning, (16 more...)

Neural Information Processing Systems

Technology: