Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs

Neural Information Processing Systems 

[no summary]