OptimisticCriticReconstructionandConstrained Fine-TuningforGeneralOffline-to-OnlineRL
–Neural Information Processing Systems
Afterobtaining an optimistic and and aligned critic, we perform constrained fine-tuning to combat distribution shift during online learning.
Neural Information Processing Systems
Feb-18-2026, 00:20:05 GMT
- Country:
- Asia
- China > Jiangsu Province
- Nanjing (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- China > Jiangsu Province
- North America > United States
- Washington > King County > Seattle (0.04)
- Asia
- Genre:
- Research Report (0.67)
- Industry:
- Education > Educational Setting > Online (0.34)
- Technology: