Diagonal Rescaling For Neural Networks