LobsDICE: OfflineLearningfromObservationvia StationaryDistributionCorrectionEstimation
–Neural Information Processing Systems
We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities.
Neural Information Processing Systems
Feb-8-2026, 07:44:12 GMT
- Technology: