Goto

Collaborating Authors

 Learning Graphical Models






LobsDICE: OfflineLearningfromObservationvia StationaryDistributionCorrectionEstimation

Neural Information Processing Systems

We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities.