LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation
–Neural Information Processing Systems
We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities.
Neural Information Processing Systems
Aug-14-2025, 05:41:25 GMT