Off-Policy IntervalEstimationwith LipschitzValueIteration

Neural Information Processing Systems 

The current success of RL highly relies on excessive amount ofdata, which, however,isusually not available inmanyreal world tasks wheredeploying anew policyisverycostlyorevenrisky.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found