Inductive Biases and Variable Creation in Self-Attention Mechanisms

Open in new window