Reviews: Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

Neural Information Processing Systems 

Summary: - The main contribution of the paper is to write the TD update as a MJLS over an augmented parameter space with one parameter vector for each pair of states in the underlying MDP. - After presenting MJLS and the idea of the augmented parameter space, they first consider the IID case where pairs of states are chosen IID and give formulas for the expected error and its covariance. Under an additional ergodicity assumption they give a convergence rate to limiting quantities. For small learning rates (not exactly clear how small in terms of problem parameters) a perturbation analysis gives an estimate of what this convergence rate is (although the value of lambda_{max real} \bar A remains unclear in terms of the parameters of the problem). Pros: - The originality of the connection between TD dynamics and MJLS is a good contribution that could increase the flow of ideas from control theory to RL. In addition, the formulation of the augmented state space seems to be a potentially useful analysis tool.