OptimisticCriticReconstructionandConstrained Fine-TuningforGeneralOffline-to-OnlineRL

Neural Information Processing Systems 

Afterobtaining an optimistic and and aligned critic, we perform constrained fine-tuning to combat distribution shift during online learning.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found