Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Open in new window