Double Actor-Critic with TD Error-Driven Regularization in Reinforcement Learning