The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles

Open in new window