AITopics | lobsdice

ASimple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Neural Information Processing SystemsApr-24-2026, 05:29:30 GMT

Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art'DIstribution Correction Estimation' (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectories or segments of expert behavior in the task-agnostic data, a common assumption in prior work. In experiments across multiple testbeds, we find TAILO to be more robust and effective, particularly with incomplete trajectories.

machine learning, natural language, trajectory, (17 more...)

Neural Information Processing Systems

Genre:

Instructional Material (0.46)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

LobsDICE: OfflineLearningfromObservationvia StationaryDistributionCorrectionEstimation

Neural Information Processing SystemsFeb-19-2026, 00:52:29 GMT

We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities.

artificial intelligence, demonstration, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

LobsDICE: OfflineLearningfromObservationvia StationaryDistributionCorrectionEstimation

Neural Information Processing SystemsFeb-8-2026, 07:44:12 GMT

We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities.

artificial intelligence, exp 1, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Neural Information Processing SystemsDec-24-2025, 00:48:28 GMT

We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy. Through an extensive set of offline LfO tasks, we show that LobsDICE outperforms strong baseline methods.

lobsdice, offline learning, stationary distribution correction estimation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

372593bd318ad8b34b3a8da77e20272b-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 05:41:29 GMT

demonstration, exp, lobsdice, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Neural Information Processing SystemsAug-14-2025, 05:41:25 GMT

We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities.

algorithm, demonstration, stationary distribution, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Neural Information Processing SystemsOct-10-2024, 15:25:44 GMT

We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy.

lobsdice, offline learning, stationary distribution correction estimation

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Yan, Kai, Schwing, Alexander G., Wang, Yu-Xiong

arXiv.org Artificial IntelligenceNov-2-2023

Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectories or segments of expert behavior in the task-agnostic data, a common assumption in prior work. In experiments across multiple testbeds, we find TAILO to be more robust and effective, particularly with incomplete trajectories.

dataset, smodice, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2311.01329

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Kim, Geon-Hyeong, Lee, Jongmin, Jang, Youngsoo, Yang, Hongseok, Kim, Kee-Eung

arXiv.org Artificial IntelligenceOct-17-2022

We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy. Through an extensive set of offline LfO tasks, we show that LobsDICE outperforms strong baseline methods.

demonstration, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2202.13536

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

Filters

Collaborating Authors

lobsdice

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ASimple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

LobsDICE: OfflineLearningfromObservationvia StationaryDistributionCorrectionEstimation

LobsDICE: OfflineLearningfromObservationvia StationaryDistributionCorrectionEstimation

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

372593bd318ad8b34b3a8da77e20272b-Supplemental-Conference.pdf

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation