Surprising Instabilities in Training Deep Networks and a Theoretical Analysis