TD(0) Learning converges for Polynomial mixing and non-linear functions

Open in new window