On the Training Convergence of Transformers for In-Context Classification