Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

Neural Information Processing Systems 

Authors contributed equally to this work.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found