LobsDICE: OfflineLearningfromObservationvia StationaryDistributionCorrectionEstimation
–Neural Information Processing Systems
We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities.
Neural Information Processing Systems
Feb-19-2026, 00:52:29 GMT
- Technology: