The Pitfalls of Regularization in Off-Policy TD Learning