MinimaxOptimalOnlineImitationLearningvia ReplayEstimation

Neural Information Processing Systems 

In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by ourapproachachievestheoptimal eO min(H3/2/Nexp,H/ p Nexp dependency, undersignificantly weakerassumptions compared topriorwork.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found