Off-Policy IntervalEstimationwith LipschitzValueIteration
–Neural Information Processing Systems
The current success of RL highly relies on excessive amount ofdata, which, however,isusually not available inmanyreal world tasks wheredeploying anew policyisverycostlyorevenrisky.
Neural Information Processing Systems
Feb-8-2026, 12:55:15 GMT