AITopics | imitation learning

Collaborating Authors

imitation learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inverse Q-Learning Done Right: Offline Imitation Learning in Qπ-Realizable MDPs

Neural Information Processing SystemsJun-23-2026, 00:54:35 GMT

We study the problem of offline imitation learning in Markov decision processes (MDPs), where the goal is to learn a well-performing policy given a dataset of state-action pairs generated by an expert policy. Complementing a recent line of work on this topic that assumes the expert belongs to a tractable class of known policies, we approach this problem from a new angle and leverage a different type of structural assumption about the environment. Specifically, for the class of linear Qπ-realizable MDPs, we introduce a new algorithm called saddle-point offline imitation learning (SPOIL), which is guaranteed to match the performance of any expert up to an additive error ε with access to O(ε 2) samples. Moreover, we extend this result to possibly nonlinear Qπ-realizable MDPs at the cost of a worse sample complexity of order O(ε 4). Finally, our analysis suggests a new loss function for training critic networks from expert data in deep imitation learning. Empirical evaluations on standard benchmarks demonstrate that the neural net implementation of SPOIL is superior to behavior cloning and competitive with state-of-the-art algorithms.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Faithful Dynamic Imitation Learning from Human Intervention with Dynamic Regret Minimization

Neural Information Processing SystemsJun-22-2026, 14:00:49 GMT

Human-in-the-loop (HIL) imitation learning enables agents to learn complex behaviors safely through real-time human intervention. However, existing methods struggle to efficiently leverage agent-generated data due to dynamically evolving trajectory distributions and imperfections caused by human intervention delays, often failing to faithfully imitate the human expert policy. In this work, we propose Faithful Dynamic Imitation Learning (FaithDaIL) to address these challenges. We formulate learning from human intervention as an online non-convex problem and employ dynamic regret minimization to adapt to the shifting data distribution and track high-quality policy trajectories. To ensure faithful imitation of human expert despite training on mixed agent and human data, we introduce an unbiased imitation objective and achieve it by weighting the behavior distribution relative to the human expert's as a proxy reward. Extensive experiments on MetaDrive and CARLA driving benchmarks demonstrate that FaithDaIL achieves state-ofthe-art performance in safety and task success with significantly reduced human intervention data compared to prior HIL baselines.

intervention, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Transportation > Ground > Road (0.68)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames

Neural Information Processing SystemsJun-22-2026, 09:37:09 GMT

Behavioral cloning is a simple yet effective technique for learning sequential decision-making from demonstrations. Recently, it has gained prominence as the core of foundation models for the physical world, where achieving generalization requires countless demonstrations of a multitude of tasks. Typically, a human expert with full information on the task demonstrates a (nearly) optimal behavior. In this paper, we propose to hide some of the task's information from the demonstrator. This "blindfolded" expert is compelled to employ nontrivial exploration to solve the task. We show that cloning the blindfolded expert generalizes better to unseen tasks than its fully-informed counterpart. We conduct experiments of real-world robot peg insertion tasks with (limited) human demonstrations, alongside videogames from the Procgen benchmark. Additionally, we support our findings with theoretical analysis, which confirms that the generalization error scales with p I/m, where I measures the amount of task information available to the demonstrator, and mis the number of demonstrated tasks. Both theory and practice indicate that cloning blindfolded experts generalizes better with fewer demonstrated tasks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Flow World Benchmark for Flying on a Word Learning

Neural Information Processing SystemsJun-19-2026, 20:48:14 GMT

Unmanned Aerial Vehicles (UAVs) are evolving into language-interactive platforms, enabling more intuitive forms of human-drone interaction. While prior works have primarily focused on high-level planning and long-horizon navigation, we shift attention to language-guided fine-grained trajectory control, where UAVs execute short-range, reactive flight behaviors in response to language instructions. We formalize this problem as the Flying-on-a-Word (Flow) task and introduce UAV imitation learning as an effective approach. In this framework, UAVs learn fine-grained control policies by mimicking eUAxpert pilotVtrajectoriesFlopaired withwatomic Fly around the tree ahead Land on the left side of carlanguage instructions. To support this paradigm, we present UAV-Flow, the firstreal-world benchmark for language-conditioned, fine-grained UAV control.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > China > Zhejiang Province (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Robotics & Automation (0.48)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Neural Information Processing SystemsJun-19-2026, 14:52:12 GMT

This paper provides the first expert sample complexity characterization for learning a Nash equilibrium from expert data in Markov Games. We show that a new quantity named the all policy deviation concentrability coefficient is unavoidable in the non-interactive imitation learning setting, and we provide an upper bound for behavioral cloning (BC) featuring such coefficient. BC exhibits substantial regret in games with high concentrability coefficient, leading us to utilize expert queries to develop and introduce two novel solution algorithms: MAIL-BRO and MURMAIL. The former employs a best response oracle and learns an ε-Nash equilibrium with O(ε 4)expert and oracle queries.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

Europe (0.45)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning

Neural Information Processing SystemsJun-16-2026, 21:14:09 GMT

Imitation learning (IL) is a paradigm for learning sequential decision-making policies from experts, leveraging offline demonstrations, interactive annotations, or both. Recent advances show that when annotation cost is tallied per trajectory, Behavior Cloning (BC)--which relies solely on offline demonstrations--cannot be improved in general, leaving limited conditions for interactive methods such as DAgger to help. We revisit this conclusion and prove that when the annotation cost is measured per state, algorithms using interactive annotations can provably outperform BC. Specifically: (1) we show that STAGGER, a one-sample-per-round variant of DAgger, provably beats BC under low-recovery-cost settings; (2) we initiate the study of hybrid IL where the agent learns from offline demonstrations and interactive annotations. We propose WARM-STAGGER whose learning guarantee is not much worse than using either data source alone.

annotation, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.93)
Research Report > New Finding (0.67)

Industry:

Education (0.46)
Information Technology (0.46)
Transportation > Ground > Road (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Multi-Agent Imitation by Learning and Sampling from Factorized Soft Q-Function

Neural Information Processing SystemsJun-14-2026, 23:22:42 GMT

Learning from multi-agent expert demonstrations, known as Multi-Agent Imitation Learning (MAIL), provides a promising approach to sequential decision-making. However, existing MAIL methods including Behavior Cloning (BC) and Adversarial Imitation Learning (AIL) face significant challenges: BC suffers from the compounding error issue, while the very nature of adversarial optimization makes AIL prone to instability. In this work, we propose Multi-Agent imitation by learning and sampling from FactorIzed Soft Q-function (MAFIS), a novel method that addresses these limitations for both online and offline MAIL settings. Built upon the single-agent IQ-Learn framework, MAFIS introduces the value decomposition network to factorize the imitation objective at agent level, thus enabling scalable training for multi-agent systems. Moreover, we observe that the soft Q-function implicitly defines the optimal policy as an energy-based model, from which we can sample actions via stochastic gradient Langevin dynamics. This allows us to estimate the gradient of the factorized optimization objective for continuous control tasks, avoiding the adversarial optimization between the soft Q-function and the policy required by prior work. By doing so, we obtain a tractable and non-adversarial objective for both discrete and continuous multi-agent control. Experiments on common benchmarks including the discrete control tasks StarCraft Multi-Agent Challenge v2 (SMACv2), Gold Miner, and Multi Particle Environments (MPE), as well as the continuous control task Multi-Agent MuJoCo (MaMuJoCo), demonstrate that MAFIS achieves superior performance compared with baselines. Our code is available at https://github.com/LAMDA-RL/MAFIS.

artificial intelligence, international conference, qtot, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.46)
Leisure & Entertainment > Games (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error

Neural Information Processing SystemsJun-14-2026, 03:21:10 GMT

We address this gap through a novel trial \& error approach for solving problems in the class NP, where candidate solutions are iteratively generated and efficiently validated using verifiers. We focus on the paradigmatic task of Sudoku and achieve state-of-the-art accuracy (99\%) compared to prior neuro-symbolic approaches. Unlike prior work that used custom architectures, our method employs a vanilla decoder-only Transformer (GPT-2) without external tools or function calling. Our method integrates imitation learning of simple Sudoku rules with an explicit Depth-First Search (DFS) exploration strategy involving informed guessing and backtracking. Moving beyond imitation learning, we seek to minimize the number of guesses until reaching a solution. This is achieved using depth-1 guessing, showing empirically that almost all Sudoku can be solved using the puzzle's rules with at most one guess. We provide a rigorous analysis of this setup formalizing its connection to a contextual variant of $\textit{Min-Sum Set Cover}$, a well-studied problem in algorithms and stochastic optimization.

large language model, machine learning, natural language, (6 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Sudoku (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.57)

Add feedback

Multi-Agent Imitation by Learning and Sampling from Factorized Soft Q-Function

Neural Information Processing SystemsJun-10-2026, 12:55:15 GMT

Learning from multi-agent expert demonstrations, known as Multi-Agent Imitation Learning (MAIL), provides a promising approach to sequential decision-making. However, existing MAIL methods including Behavior Cloning (BC) and Adversarial Imitation Learning (AIL) face significant challenges: BC suffers from the compounding error issue, while the very nature of adversarial optimization makes AIL prone to instability. In this work, we propose \textbf{M}ulti-\textbf{A}gent imitation by learning and sampling from \textbf{F}actor\textbf{I}zed \textbf{S}oft Q-function (MAFIS), a novel method that addresses these limitations for both online and offline MAIL settings. Built upon the single-agent IQ-Learn framework, MAFIS introduces the value decomposition network to factorize the imitation objective at agent level, thus enabling scalable training for multi-agent systems. Moreover, we observe that the soft Q-function implicitly defines the optimal policy as an energy-based model, from which we can sample actions via stochastic gradient Langevin dynamics. This allows us to estimate the gradient of the factorized optimization objective for continuous control tasks, avoiding the adversarial optimization between the soft Q-function and the policy required by prior work. By doing so, we obtain a tractable and \emph{non-adversarial} objective for both discrete and continuous multi-agent control. Experiments on common benchmarks including the discrete control tasks StarCraft Multi-Agent Challenge v2 (SMACv2), Gold Miner, and Multi Particle Environments (MPE), as well as the continuous control task Multi-Agent MuJoCo (MaMuJoCo), demonstrate that MAFIS achieves superior performance compared with baselines. Our code is available at https://github.com/LAMDA-RL/MAFIS.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.59)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Filters

Collaborating Authors

imitation learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Inverse Q-Learning Done Right: Offline Imitation Learning in Qπ-Realizable MDPs

Faithful Dynamic Imitation Learning from Human Intervention with Dynamic Regret Minimization

Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames

Flow World Benchmark for Flying on a Word Learning

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning

Multi-Agent Imitation by Learning and Sampling from Factorized Soft Q-Function

Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error

Multi-Agent Imitation by Learning and Sampling from Factorized Soft Q-Function

fe692980c5d9732cf153ce27947653a7-Paper-Conference.pdf