Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Open in new window