Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs

Open in new window