Final policy RL fine-tuning Envir

Neural Information Processing Systems 

Figure dynamics into predict learning from green.