AITopics | imitation gap

Collaborating Authors

imitation gap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

sup

Neural Information Processing SystemsApr-25-2026, 07:43:39 GMT

A.1 Notation In this appendix, we use the notation dπt(,) to indicate the state-action visitation measure induced by the policy π at time t. We overload the notation dπt() to denote the state-visitation measure induced by the policy π at time t. Likewise, the notations dDt (,) and dDt () indicate the empirical visitation measures in the dataset D. For a function g: X R, the norm kgk, supx X |g(x)|. Before discussing the proofs of the results, we also explain the instantiation of the function class in the tabular setting below. A.2 Imitation gap upper bound on empirical moment matching (Theorem 3.1) Below we restate Theorem 3.1 and provide a proof of this result. The key observation is that since the learner πMM best matches the empirical distribution in the dataset, which is in turn close to the population visitation measure induced by πE, we can expect the visitation measure induced by πE and πMM to be close. This in turns implies that both policies will collect a similar value under any reward function. Precisely characterizing the rates at which these distributions converge to one another results in the final bound. Consider the empirical moment matching learner πMM (eq. TV dπt,dDt (20) where the equation follows by the variational definition of the total variation distance, and where dπt is the state-action visitation measure induced by πE and dDt is the empirical state-action visitation measure in the dataset D. The imitation gap of this policy can be upper bounded by, J(πE) J(πMM) = EπE "H This goes to show that in the tabular setting, MMis equivalent to finding the policy which best matches (in TV-distance) the empirical state-action distribution observed in the dataset.

artificial intelligence, machine learning, nexp, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Minimax Optimal Online Imitation Learning via Replay Estimation

Neural Information Processing SystemsApr-25-2026, 07:43:36 GMT

Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap that scales with H2/Nexp for behavioral cloning and H/ p Nexp for online moment matching, where H is the horizon and Nexp is the size of the expert dataset. We introduce the technique of replay estimation to reduce this empirical variance: by repeatedly executing cached expert actions in a stochastic simulator, we compute a smoother expert visitation distribution estimate to match. In the presence of parametric function approximation, we prove a meta theorem reducing the performance gap of our approach to the parameter estimation error for offline classification (i.e.

artificial intelligence, machine learning, nexp, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Instructional Material > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.35)

Add feedback

2e809adc337594e0fee330a64acbb982-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 00:15:01 GMT

exp, learner, probability 1, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On Imitation in Mean-field Games

Neural Information Processing SystemsFeb-15-2026, 12:19:33 GMT

In this paper, departing from the existing literature on IL for MFGs, we introduce a new solution concept called the Nash imitation gap.

machine learning, occupancy measure, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Game Theory (0.69)
(2 more...)

Add feedback

On Imitation in Mean-field Games

Neural Information Processing SystemsFeb-15-2026, 12:19:29 GMT

In this paper, departing from the existing literature on IL for MFGs, we introduce a new solution concept called the Nash imitation gap.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Robots (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

3af25aa3de8b7b02ddbd1b6be5031be8-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 13:16:16 GMT

dataset, isw-bc, nbcu, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

3af25aa3de8b7b02ddbd1b6be5031be8-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 13:16:12 GMT

dataset, isw-bc, nbcu, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Bridging the Imitation Gap by Adaptive Insubordination

Neural Information Processing SystemsDec-24-2025, 14:50:54 GMT

In practice, imitation learning is preferred over pure reinforcement learning whenever it is possible to design a teaching agent to provide expert supervision. However, we show that when the teaching agent makes decisions with access to privileged information that is unavailable to the student, this information is marginalized during imitation learning, resulting in an imitation gap and, potentially, poor results.

adaptive insubordination, imitation gap, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.38)

Add feedback