Attention Mechanism, Max-Affine Partition, and Universal Approximation

Open in new window