Exploiting the sign of the advantage function to learn deterministic policies in continuous domains