Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching

Open in new window