An Actor/Critic Algorithm that is Equivalent to Q-Learning

Open in new window