An Actor/Critic Algorithm that is Equivalent to Q-Learning
Crites, Robert H., Barto, Andrew G.
–Neural Information Processing Systems
We prove the convergence of an actor/critic algorithm that is equivalent toQ-Iearning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor andcritic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using criteria thatdepend on the relative probability of the action that was executed.
Neural Information Processing Systems
Dec-31-1995
- Country:
- North America > United States > Massachusetts > Hampshire County > Amherst (0.15)
- Technology: