The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

Open in new window