AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Policy Optimization for Robust Average Cost MDPs

Neural Information Processing SystemsOct-9-2025, 20:30:25 GMT

Specifically, we focus on ergodic Markov chains.

algorithm, cost mdp, robust average cost mdp, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.04)
North America > United States > Connecticut (0.04)
North America > United States > Arizona (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

ORA: Towards Safety Alignment of T ext2Video Generation via a Human Preference Dataset

Neural Information Processing SystemsOct-9-2025, 20:30:19 GMT

This dataset encompasses human preferences in text-to-video generation tasks along two primary dimensions: helpfulness and harmlessness.

classification, dataset, video, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Oregon (0.04)
Europe > Monaco (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Neural Information Processing SystemsOct-9-2025, 20:30:18 GMT

However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy.

algorithm, experiment, optimization, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

1e6dcc16ffa7ced2228d1f2fdc8b5adf-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 20:29:17 GMT

abstract state, arp, evaluation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey (0.04)
North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.68)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

1e616bde0438cb10cb6adf076ae7d336-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 20:28:56 GMT

agent, experiment, rtus, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada > Alberta (0.14)
Europe > Portugal > Braga > Braga (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
(2 more...)

Add feedback

1e38b2a0b77541b14a3315c99697b835-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 20:28:32 GMT

arxiv preprint arxiv, dataset, learning, (12 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Melting Pot Contest: Charting the Future of Generalized Cooperative Intelligence

Neural Information Processing SystemsOct-9-2025, 20:19:59 GMT

As AI systems become increasingly sophisticated and interconnected, it will be critical that they be competent at cooperating, both with other AI systems and with humans.

agent, scenario, substrate, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery Alex Rutherford Michael Beukman Timon Willi Bruno Lacerda Nick Hawes Jakob Foerster University of Oxford

Neural Information Processing SystemsOct-9-2025, 20:14:14 GMT

Put differently, current methods fail to predict intuitive measures of "learnability."

agent, jaxnav, learnability, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.40)
Asia > Middle East > Jordan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning Hao Ma

Neural Information Processing SystemsOct-9-2025, 20:11:47 GMT

Reinforcement learning (RL) has emerged as a pivotal technique for fine-tuning large language models (LLMs) on specific tasks. However, prevailing RL fine-tuning methods predominantly rely on PPO and its variants. Though these algorithms are effective in general RL settings, they often exhibit suboptimal performance and vulnerability to distribution collapse when applied to the fine-tuning of LLMs.

fine-tuning, kl divergence, task reward, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Macao (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry:

Education (0.93)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn

Neural Information Processing SystemsOct-9-2025, 20:05:38 GMT

Network outputs can change indirectly to unexpected values after any random batch update for input data not included in the batch, called churn in this paper.

deviation, reinforcement learning, value and policy, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Portugal > Braga > Braga (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback