Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions