You Only Cache Once: Decoder-Decoder Architectures for Language Models Y utao Sun Li Dong

Neural Information Processing Systems 

It consists of two components, i.e., a cross-decoder stacked upon a self-decoder .