A Theory on Adam Instability in Large-Scale Machine Learning