Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching