Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift

Open in new window