Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
–Neural Information Processing Systems
A wide array of sequence models are built on a framework modeled after Transformers, comprising alternating sequence mixer and channel mixer layers.
Neural Information Processing Systems
Mar-27-2025, 07:43:48 GMT