AITopics | zero-shot imitation

Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models

Neural Information Processing SystemsDec-26-2025, 08:28:44 GMT

Unlike most reinforcement learning agents which require an unrealistic amount of environment interactions to learn a new behaviour, humans excel at learning quickly by merely observing and imitating others. This ability highly depends on the fact that humans have a model of their own embodiment that allows them to infer the most likely actions that led to the observed behaviour. In this paper, we propose Action Inference by Maximising Evidence (AIME) to replicate this behaviour using world models. AIME consists of two distinct phases. In the first phase, the agent learns a world model from its past experience to understand its own body by maximising the ELBO.

action inference, maximising evidence, zero-shot imitation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

Consistent Zero-Shot Imitation with Contrastive Goal Inference

Wantlin, Kathryn, Zheng, Chongyi, Eysenbach, Benjamin

arXiv.org Artificial IntelligenceOct-21-2025

In the same way that generative models today conduct most of their training in a self-supervised fashion, how can agentic models conduct their training in a self-supervised fashion, interactively exploring, learning, and preparing to quickly adapt to new tasks? A prerequisite for embodied agents deployed in real world interactions ought to be training with interaction, yet today's most successful AI models (e.g., VLMs, LLMs) are trained without an explicit notion of action. The problem of pure exploration (which assumes no data as input) is well studied in the reinforcement learning literature and provides agents with a wide array of experiences, yet it fails to prepare them for rapid adaptation to new tasks. Today's language and vision models are trained on data provided by humans, which provides a strong inductive bias for the sorts of tasks that the model will have to solve (e.g., modeling chords in a song, phrases in a sonnet, sentences in a medical record). However, when they are prompted to solve a new task, there is a faulty tacit assumption that humans spend most of their time in the most rewarding states. The key contribution of our paper is a method for pre-training interactive agents in a self-supervised fashion, so that they can instantly mimic human demonstrations. Our method treats goals (i.e., observations) as the atomic construct. During training, our method automatically proposes goals and practices reaching them, building off prior work in reinforcement learning exploration. During evaluation, our method solves an (amortized) inverse reinforcement learning problem to explain demonstrations as optimal goal-reaching behavior. Experiments on standard benchmarks (not designed for goal-reaching) show that our approach outperforms prior methods for zero-shot imitation.

machine learning, natural language, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2510.17059

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models

Neural Information Processing SystemsJan-19-2025, 15:28:37 GMT

Unlike most reinforcement learning agents which require an unrealistic amount of environment interactions to learn a new behaviour, humans excel at learning quickly by merely observing and imitating others. This ability highly depends on the fact that humans have a model of their own embodiment that allows them to infer the most likely actions that led to the observed behaviour. In this paper, we propose Action Inference by Maximising Evidence (AIME) to replicate this behaviour using world models. AIME consists of two distinct phases. In the first phase, the agent learns a world model from its past experience to understand its own body by maximising the ELBO.

Add feedback

Filters

Collaborating Authors

zero-shot imitation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models

Consistent Zero-Shot Imitation with Contrastive Goal Inference

Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models