$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Open in new window