O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Open in new window