Mixed Policy Gradient: off-policy reinforcement learning driven jointly by data and model

Open in new window