Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention

Open in new window