Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

Open in new window