Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

Neural Information Processing Systems 

A wide array of sequence models are built on a framework modeled after Transformers, comprising alternating sequence mixer and channel mixer layers.