Myosotis: structured computation for attention like layer

Egorov, Evgenii, Ackermann, Hanno, Nagel, Markus, Cai, Hong

Sep-26-2025–arXiv.org Artificial Intelligence

Attention layers apply a sequence-to-sequence mapping whose parameters depend on the pairwise interactions of the input elements. However, without any structural assumptions, memory and compute scale quadratically with the sequence length. The two main ways to mitigate this are to introduce sparsity by ignoring a sufficient amount of pairwise interactions or to introduce recurrent dependence along them, as SSM does. Although both approaches are reasonable, they both have disadvantages. We propose a novel algorithm that combines the advantages of both concepts. Our idea is based on the efficient inversion of tree-structured matrices.

artificial intelligence, international conference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Sep-26-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found