In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
–Neural Information Processing Systems
Neural Information Processing Systems
May-28-2025, 17:53:39 GMT
–Neural Information Processing Systems
Neural Information Processing Systems
May-28-2025, 17:53:39 GMT