LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Neural Information Processing Systems 

We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found