AITopics | imitator

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Neural Information Processing SystemsNov-19-2025, 23:34:15 GMT

Causal Imitation for Markov Decision Processes: a Partial Identification Approach

Children often learn how to behave in an unfamiliar environment by imitating adults. Imitation learning (IL) enables a learning agent to behave in an unknown environment by observing expert demonstrations.

imitator, machine learning, reinforcement learning, (20 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Instructional Material (0.65)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsNov-14-2025, 15:33:54 GMT

7b670d553471ad0fd7491c75bad587ff-Paper.pdf

imitator, machine learning, reinforcement learning, (17 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Instructional Material (0.46)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Neural Information Processing SystemsOct-10-2025, 11:38:27 GMT

Causal Imitation for Markov Decision Processes: a Partial Identification Approach

imitator, machine learning, reinforcement learning, (20 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Instructional Material (0.65)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsAug-15-2025, 08:49:15 GMT

Sequential Causal Imitation Learning with Unobserved Confounders

"Monkey see monkey do" is an age-old adage, referring to naïve imitation without a

imitator, machine learning, reinforcement learning, (17 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Instructional Material (0.46)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Shah, Rushit N., Agadakos, Nikolaos, Sasulski, Synthia, Farajzadeh, Ali, Choudhury, Sanjiban, Ziebart, Brian

Imitation Learning via Focused Satisficing

arXiv.org Artificial IntelligenceMay-27-2025

Imitation learning often assumes that demonstrations are close to optimal according to some fixed, but unknown, cost function. However, according to satisficing theory, humans often choose acceptable behavior based on their personal (and potentially dynamic) levels of aspiration, rather than achieving (near-) optimality. For example, a lunar lander demonstration that successfully lands without crashing might be acceptable to a novice despite being slow or jerky. Using a margin-based objective to guide deep reinforcement learning, our focused satisficing approach to imitation learning seeks a policy that surpasses the demonstrator's aspiration levels -- defined over trajectories or portions of trajectories -- on unseen demonstrations without explicitly learning those aspirations. We show experimentally that this focuses the policy to imitate the highest quality (portions of) demonstrations better than existing imitation learning methods, providing much higher rates of guaranteed acceptability to the demonstrator, and competitive true returns on a range of environments.

demonstration, machine learning, reinforcement learning, (17 more...)

2505.1482

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Shao, Daqian, Buening, Thomas Kleine, Kwiatkowska, Marta

A Unifying Framework for Causal Imitation Learning with Hidden Confounders

arXiv.org Artificial IntelligenceFeb-11-2025

We propose a general and unifying framework for causal Imitation Learning (IL) with hidden confounders that subsumes several existing confounded IL settings from the literature. Our framework accounts for two types of hidden confounders: (a) those observed by the expert, which thus influence the expert's policy, and (b) confounding noise hidden to both the expert and the IL algorithm. For additional flexibility, we also introduce a confounding noise horizon and time-varying expert-observable hidden variables. We show that causal IL in our framework can be reduced to a set of Conditional Moment Restrictions (CMRs) by leveraging trajectory histories as instruments to learn a history-dependent policy. We propose DML-IL, a novel algorithm that uses instrumental variable regression to solve these CMRs and learn a policy. We provide a bound on the imitation gap for DML-IL, which recovers prior results as special cases. Empirical evaluation on a toy environment with continues state-action spaces and multiple Mujoco tasks demonstrate that DML-IL outperforms state-of-the-art causal IL algorithms.

artificial intelligence, confounder, machine learning, (11 more...)

2502.07656

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Grislain, Clémence, Vuorio, Risto, Lu, Cong, Whiteson, Shimon

IGDrivSim: A Benchmark for the Imitation Gap in Autonomous Driving

arXiv.org Artificial IntelligenceNov-7-2024

Developing autonomous vehicles that can navigate complex environments with human-level safety and efficiency is a central goal in self-driving research. A common approach to achieving this is imitation learning, where agents are trained to mimic human expert demonstrations collected from real-world driving scenarios. However, discrepancies between human perception and the self-driving car's sensors can introduce an \textit{imitation gap}, leading to imitation learning failures. In this work, we introduce \textbf{IGDrivSim}, a benchmark built on top of the Waymax simulator, designed to investigate the effects of the imitation gap in learning autonomous driving policy from human expert demonstrations. Our experiments show that this perception gap between human experts and self-driving agents can hinder the learning of safe and effective driving behaviors. We further show that combining imitation with reinforcement learning, using a simple penalty reward for prohibited behaviors, effectively mitigates these failures. Our code is open-sourced at: https://github.com/clemgris/IGDrivSim.git.

artificial intelligence, imitator, machine learning, (18 more...)

2411.04653

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Neural Information Processing SystemsOct-11-2024, 08:45:16 GMT

Sequential Causal Imitation Learning with Unobserved Confounders

"Monkey see monkey do" is an age-old adage, referring to naive imitation without a deep understanding of a system's underlying mechanics. Indeed, if a demonstrator has access to information unavailable to the imitator (monkey), such as a different set of sensors, then no matter how perfectly the imitator models its perceived environment (See), attempting to directly reproduce the demonstrator's behavior (Do) can lead to poor outcomes. Imitation learning in the presence of a mismatch between demonstrator and imitator has been studied in the literature under the rubric of causal imitation learning (Zhang et. This paper investigates the problem of causal imitation learning in sequential settings, where the imitator must make multiple decisions per episode. We develop a graphical criterion that is both necessary and sufficient for determining the feasibility of causal imitation, providing conditions when an imitator can match a demonstrator's performance despite differing capabilities.

artificial intelligence, machine learning, sequential causal imitation learning, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Cohen, Michael K., Hutter, Marcus, Bengio, Yoshua, Russell, Stuart

RL, but don't do anything I wouldn't do

arXiv.org Artificial IntelligenceOct-8-2024

In reinforcement learning, if the agent's reward differs from the designers' true utility, even only rarely, the state distribution resulting from the agent's policy can be very bad, in theory and in practice. When RL policies would devolve into undesired behavior, a common countermeasure is KL regularization to a trusted policy ("Don't do anything I wouldn't do"). All current cutting-edge language models are RL agents that are KL-regularized to a "base policy" that is purely predictive. Unfortunately, we demonstrate that when this base policy is a Bayesian predictive model of a trusted policy, the KL constraint is no longer reliable for controlling the behavior of an advanced RL agent. We demonstrate this theoretically using algorithmic information theory, and while systems today are too weak to exhibit this theorized failure precisely, we RL-finetune a language model and find evidence that our formal results are plausibly relevant in practice. We also propose a theoretical alternative that avoids this problem by replacing the "Don't do anything I wouldn't do" principle with "Don't do anything I mightn't do".

large language model, machine learning, reinforcement learning, (22 more...)

2410.06213

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)