Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)