How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

Open in new window