Variational Delayed Policy Optimization