AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Trust Region-Guided Proximal Policy Optimization

Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan

Neural Information Processing SystemsAug-19-2025, 22:42:31 GMT

However, the first-order optimizer is not very accurate for curved areas.

constraint, ppo, trgppo, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.15)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

ff6b031d5bdc552b795175a0f3b35a50-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 22:28:40 GMT

arxiv preprint arxiv, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Dual Generator Offline Reinforcement Learning

Neural Information Processing SystemsAug-19-2025, 22:21:52 GMT

Such promise is especially appealing in domains where data collection is expensive or dangerous, but large amounts of data may already exists (e.g., robotics, autonomous driving, task-oriented dialog systems).

generator, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.69)
Instructional Material (0.46)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

Mahmoud ("Mido") Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Mike Rabbat

Neural Information Processing SystemsAug-19-2025, 22:14:21 GMT

Neural Information Processing Systems http://nips.cc/

agent, gala, simulator, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Explicable Policy Search

Neural Information Processing SystemsAug-19-2025, 22:13:09 GMT

Human teammates often form conscious and subconscious expectations of each other during interaction. Teaming success is contingent on whether such expectations can be met. Similarly, for an intelligent agent to operate beside a human, it must consider the human's expectation of its behavior. Disregarding such expectations can lead to the loss of trust and degraded team performance. A key challenge here is that the human's expectation may not align with the agent's

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Zihan Zhang, Xiangyang Ji

Neural Information Processing SystemsAug-19-2025, 21:52:09 GMT

Therefore, there is a trade-off between exploration and exploitation, i.e., taking actions we have not learned accurately enough and taking actions which

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America (0.15)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology: