AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Self-Guiding Exploration for Combinatorial Problems

Neural Information Processing SystemsOct-10-2025, 20:29:01 GMT

They utilize prompting techniques such as Exploration-of-Thought, Decomposition, and Refinement to effectively navigate and solve intricate tasks.

combinatorial problem, experiment, node avg, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)
North America > Canada (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(5 more...)

Add feedback

eb86863306d17c58d18315b5261a4520-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 20:28:49 GMT

disease state, epicare, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
(2 more...)

Add feedback

Regularized Q-Learning

Neural Information Processing SystemsOct-10-2025, 20:19:56 GMT

Q-learning is widely used algorithm in reinforcement learning (RL) community. Under the lookup table setting, its convergence is well established.

algorithm, matrix, q-learning, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Shared Autonomy with IDA: Interventional Diffusion Assistance Brandon J. McMahan

Neural Information Processing SystemsOct-10-2025, 20:01:01 GMT

The rapid development of artificial intelligence (AI) has unearthed the potential to assist humans in controlling advanced technologies.

copilot, experiment, intervention, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health Interventions

Neural Information Processing SystemsOct-10-2025, 20:00:51 GMT

Mobile health leverages personalized and contextually tailored interventions optimized through bandit and reinforcement learning algorithms.

algorithm, participant, rome, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Parseval Regularization for Continual Reinforcement Learning

Neural Information Processing SystemsOct-10-2025, 19:59:05 GMT

We show that it provides significant benefits to RL agents on a suite of gridworld, CARL and MetaWorld tasks.

agent, arxiv preprint arxiv, parseval regularization, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Portugal > Braga > Braga (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

Neural Information Processing SystemsOct-10-2025, 19:51:18 GMT

To address this, we introduce a conservative fine-tuning approach, BRAID, by optimizing a conservative reward model, which includes additional penalization outside of offline data distributions.

arxiv preprint arxiv, diffusion model, generative model, (12 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: