Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View