AITopics | continuous control task

Multi-Agent Imitation by Learning and Sampling from Factorized Soft Q-Function

Neural Information Processing SystemsJun-14-2026, 23:22:42 GMT

Learning from multi-agent expert demonstrations, known as Multi-Agent Imitation Learning (MAIL), provides a promising approach to sequential decision-making. However, existing MAIL methods including Behavior Cloning (BC) and Adversarial Imitation Learning (AIL) face significant challenges: BC suffers from the compounding error issue, while the very nature of adversarial optimization makes AIL prone to instability. In this work, we propose Multi-Agent imitation by learning and sampling from FactorIzed Soft Q-function (MAFIS), a novel method that addresses these limitations for both online and offline MAIL settings. Built upon the single-agent IQ-Learn framework, MAFIS introduces the value decomposition network to factorize the imitation objective at agent level, thus enabling scalable training for multi-agent systems. Moreover, we observe that the soft Q-function implicitly defines the optimal policy as an energy-based model, from which we can sample actions via stochastic gradient Langevin dynamics. This allows us to estimate the gradient of the factorized optimization objective for continuous control tasks, avoiding the adversarial optimization between the soft Q-function and the policy required by prior work. By doing so, we obtain a tractable and non-adversarial objective for both discrete and continuous multi-agent control. Experiments on common benchmarks including the discrete control tasks StarCraft Multi-Agent Challenge v2 (SMACv2), Gold Miner, and Multi Particle Environments (MPE), as well as the continuous control task Multi-Agent MuJoCo (MaMuJoCo), demonstrate that MAFIS achieves superior performance compared with baselines. Our code is available at https://github.com/LAMDA-RL/MAFIS.

artificial intelligence, international conference, qtot, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.46)
Leisure & Entertainment > Games (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Do's and Don'ts: Learning Desirable Skills with Instruction Videos

Neural Information Processing SystemsMar-20-2026, 16:04:35 GMT

Unsupervised skill discovery is a learning paradigm that aims to acquire diverse behaviors without explicit rewards. However, it faces challenges in learning complex behaviors and often leads to learning unsafe or undesirable behaviors. For instance, in various continuous control tasks, current unsupervised skill discovery methods succeed in learning basic locomotions like standing but struggle with learning more complex movements such as walking and running. Moreover, they may acquire unsafe behaviors like tripping and rolling or navigate to undesirable locations such as pitfalls or hazardous areas.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Policy Optimization via Importance Sampling

Neural Information Processing SystemsMar-16-2026, 21:55:49 GMT

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

a439259e78294c38d157a51a2c40486b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 04:22:13 GMT

machine learning, natural language, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > South Carolina (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
(3 more...)

Add feedback

96d00450ed65531ffe2996daed487536-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 00:21:45 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
North America > United States > Oregon (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
(2 more...)

Genre: Research Report (0.93)

Industry: Government > Military (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Wenjie Shi, Shiji Song, Hui Wu, Ya-Chu Hsu, Cheng Wu, Gao Huang

Neural Information Processing SystemsFeb-13-2026, 20:02:50 GMT

Model-free deepreinforcement learning (RL)algorithms havebeenwidely used for a range of complex control tasks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

f600d1a3f6a63f782680031f3ce241a7-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 21:27:44 GMT

experiment, exploration, reward shift, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

cceff8faa855336ad53b3325914caea2-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 10:14:26 GMT

continuous control task, learning curve mean-std, transition, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

In this section, we present detailed proofs for the theoretical derivation of Thm. 1, which aims to solvethefollowingoptimizationproblem: min

Neural Information Processing SystemsFeb-9-2026, 23:15:29 GMT

These assumptions are not strong and can be satisfied in most of environments includes MuJoCo, Atarigamesandsoon. Let f be an Lebesgue integrable function, P and Q are two probability distributions, |f| C,then EP(x)f(x) EQ(x)f(x) CDTV(P,Q) (5) Proof. Suppose there are two actions a1, a2 under state s, and let Q1(s,a1) = u, Q1(s,a2) = v. In this way, we can derive the upper bound of Ea π2Q1(s,a) Ea π1Q1(s,a)asabove. Since both sides of the above equation have the same minimum (here the minima are given by Qk = Q), we can replace the objective in Problem 2 with the upper bound in Eq. (10) and solve therelaxedoptimizationproblem.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: