How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?