An Actor/Critic Algorithm that is Equivalent to Q-Learning

Crites, Robert H., Barto, Andrew G.

Neural Information Processing Systems 

We prove the convergence of an actor/critic algorithm that is equivalent toQ-Iearning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor andcritic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using criteria thatdepend on the relative probability of the action that was executed.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found