provides a theoretical basis for building the proposed Multi-linear attention