Generalization vs. Memorization in the Presence of Statistical Biases in Transformers

Open in new window