Exploiting Estimation Bias in Deep Double Q-Learning for Actor-Critic Methods