Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

Open in new window