Improved off-policy training of diffusion samplers