AITopics | Agents

In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(7 more...)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Inverse Reward Design

Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J. Russell, Anca Dragan

Neural Information Processing SystemsNov-21-2025, 06:51:15 GMT

Autonomous agents optimize the reward function we give them.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
(2 more...)

Add feedback

Statistical Cost Sharing

Eric Balkanski, Umar Syed, Sergei Vassilvitskii

Neural Information Processing SystemsNov-21-2025, 06:46:32 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, shapley value, (18 more...)

Neural Information Processing Systems

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.77)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Countering Feedback Delays in Multi-Agent Learning

Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter W. Glynn, Claire Tomlin

Neural Information Processing SystemsNov-21-2025, 06:27:37 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, assumption, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.50)

Add feedback

A multi-agent reinforcement learning model of common-pool resource appropriation

Neural Information Processing SystemsNov-21-2025, 06:12:22 GMT

Humanity faces numerous problems of common-pool resource appropriation.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.05)
South America > Brazil > São Paulo (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Industry:

Leisure & Entertainment > Games (0.48)
Food & Agriculture > Fishing (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Multi-View Decision Processes: The Helper-AI Problem

Christos Dimitrakakis, David C. Parkes, Goran Radanovic, Paul Tylkin

Neural Information Processing SystemsNov-21-2025, 05:42:13 GMT

We model this setting as a multi-view decision process, which we use to formally analyze the positive effect of steering policies.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

Hybrid Reward Architecture for Reinforcement Learning Harm van Seijen

Neural Information Processing SystemsNov-21-2025, 04:47:54 GMT

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Neural Information Processing SystemsNov-21-2025, 03:56:43 GMT

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas (0.04)
(5 more...)

Genre: Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent

Cao, Shiyi, Li, Dacheng, Zhao, Fangzhou, Yuan, Shuo, Hegde, Sumanth R., Chen, Connor, Ruan, Charlie, Griggs, Tyler, Liu, Shu, Tang, Eric, Liaw, Richard, Moritz, Philipp, Zaharia, Matei, Gonzalez, Joseph E., Stoica, Ion

arXiv.org Artificial IntelligenceNov-21-2025

We introduce SkyRL-Agent, a framework for efficient, multi-turn, long-horizon agent training and evaluation. It provides efficient asynchronous dispatching, lightweight tool integration, and flexible backend interoperability, enabling seamless use with existing RL frameworks such as SkyRL-train, VeRL, and Tinker. Using SkyRL-Agent, we train SA-SWE-32B, a software engineering agent trained from Qwen3-32B (24.4% Pass@1) purely with reinforcement learning. We introduce two key components: an optimized asynchronous pipeline dispatcher that achieves a 1.55x speedup over naive asynchronous batching, and a tool-enhanced training recipe leveraging an AST-based search tool to facilitate code navigation, boost rollout Pass@K, and improve training efficiency. Together, these optimizations enable SA-SWE-32B to reach 39.4% Pass@1 on SWE-Bench Verified with more than 2x cost reduction compared to prior models reaching similar performance. Despite being trained solely on SWE tasks, SA-SWE-32B generalizes effectively to other agentic tasks, including Terminal-Bench, BrowseComp-Plus, and WebArena. We further demonstrate SkyRL-Agent's extensibility through case studies on deep research, computer use, and memory agents, each trained using a different training backend.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.16108

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback