Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought