AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

760b5def8dcb1156aac454e9c0f5f406-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 06:24:20 GMT

chance constraint, constraint, flipping-based policy, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

74c4f2b87b7499d365422152c76fd916-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 06:16:18 GMT

algorithm, demonstration, subtask, (12 more...)

Neural Information Processing Systems

Country:

Europe > Italy (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Surgery (0.93)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

benchmarks (Freeman et al., 2021) show that T A

Neural Information Processing SystemsOct-10-2025, 06:15:39 GMT

However, various systems are inherently continuous in time, making discrete-time MDPs an inexact modeling choice. In many applications, such as greenhouse control or medical treatments, each interaction (measurement or switching of action) involves manual intervention and thus is inherently costly.

algorithm, experiment, interaction, (13 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Food & Agriculture > Agriculture (0.67)
Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Distributional Reinforcement Learning with Regularized Wasserstein Loss Ke Sun

Neural Information Processing SystemsOct-10-2025, 06:13:43 GMT

Empirically, we show that SinkhornDRL consistently outperforms or matches existing algorithms on the Atari games suite and particularly stands out in the multi-dimensional reward setting.

algorithm, sinkhorn divergence, sinkhorndrl, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

7314e20a73542bbfff25030d1185ce88-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 06:07:00 GMT

dynamic model, edge-of-reach problem, edge-of-reach state, (14 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy Cameron Allen UC Berkeley Aaron Kirtland Brown University Ruo Y u Tao

Neural Information Processing SystemsOct-10-2025, 06:06:52 GMT

These authors contributed equally and are ordered alphabetically.

agent, experiment, pomdp, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Higher Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch

Neural Information Processing SystemsOct-10-2025, 06:04:55 GMT

Detecting and handling misspecified objectives, such as reward functions, has been widely recognized as one of the central challenges within the domain of Artificial Intelligence (AI) safety research.

occupancy frequency, optimal policy, reward function, (15 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Colorado (0.04)
North America > United States > Arizona (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Periodic agent-state based Q-learning for POMDPs

Neural Information Processing SystemsOct-10-2025, 05:54:34 GMT

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP . However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy.

agent state, asql, markov chain, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Opponent Modeling with In-context Search

Neural Information Processing SystemsOct-10-2025, 05:46:15 GMT

Opponent modeling is a longstanding research topic aimed at enhancing decision-making by modeling information about opponents in multi-agent environments. However, existing approaches often face challenges such as having difficulty generalizing to unknown opponent policies and conducting unstable performance.

international conference, opponent, opponent policy, (13 more...)

Neural Information Processing Systems

Country: