AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

Neural Information Processing SystemsOct-10-2025, 22:13:39 GMT

Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system.

adder, multiplier, prefix tree, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology (1.00)
Semiconductors & Electronics (0.68)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)
(2 more...)

Add feedback

GUIDE: Real-Time Human-Shaped Agents

Neural Information Processing SystemsOct-10-2025, 22:12:37 GMT

Due to their inherent complexity, these tasks pose significant challenges for current machine learning systems.

agent, experiment, human feedback, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Off-Policy Selection for Initiating Human-Centric Experimental Design Ge Gao Xi Y ang

Neural Information Processing SystemsOct-10-2025, 22:12:25 GMT

Human-centric systems (HCSs), e.g. , used in healthcare facilities [ Given the long testing horizon ( e.g. , several years, or semesters, in healthcare, and IE, respectively) and the high cost of recruiting participants, online testing is considered exceedingly The work was done at North Carolina State University. In this section, we introduce the FPS method, which determines the policy to be deployed to new participants that join an existing cohort, conditioned only on their initial states.

international conference, participant, student, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > North Carolina (0.24)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Instructional Material (1.00)
Research Report > New Finding (0.67)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.67)
Health & Medicine > Health Care Providers & Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(4 more...)

Add feedback

Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning

Neural Information Processing SystemsOct-10-2025, 22:05:28 GMT

A challenging problem in seeking to bring multi-agent reinforcement learning (MARL) techniques into real-world applications, such as autonomous driving and drone swarms, is how to control multiple agents safely and cooperatively to accomplish tasks.

agent, algorithm, assumption 2, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanxi Province (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Industry:

Information Technology (0.66)
Energy > Power Industry (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Going Beyond Heuristics by Imposing Policy Improvement as a Constraint Chi-Chang Lee

Neural Information Processing SystemsOct-10-2025, 22:03:17 GMT

As such, we prevent policies from merely exploiting heuristic rewards without improving the task reward.

buf, reset, torch, (16 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Diffusion Imitation from Observation

Neural Information Processing SystemsOct-10-2025, 21:49:30 GMT

Learning from observation (LfO) aims to imitate experts by learning from state-only demonstrations without requiring action labels.

diffusion model, international conference, transition, (13 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Towards Effective Planning Strategies for Dynamic Opinion Networks

Neural Information Processing SystemsOct-10-2025, 21:48:50 GMT

Our experimental results demonstrate that the ranking algorithm-based classifiers provide plans that enhance infection rate control, especially with increased action budgets for small networks.

infection rate, initial misinformation source, node, (10 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > South Carolina (0.04)
Asia > Middle East > Oman > Muscat Governorate > Muscat (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Government > Voting & Elections (0.67)
Media > News (0.53)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Add feedback

f7ae4fe91d96f50abc2211f09b6a7e49-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 21:48:43 GMT

agent, arxiv preprint arxiv, language model, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media (1.00)
Information Technology (1.00)
Banking & Finance > Trading (1.00)
Leisure & Entertainment (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
(5 more...)

Add feedback

f751c6f8bfb52c60f43942896fe65904-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 21:47:57 GMT

dataset, demonstration, experiment, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Montana (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

Neural Information Processing SystemsOct-10-2025, 21:44:02 GMT

Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation.

darc, source domain, target domain, (16 more...)

Neural Information Processing Systems

Country: