Superiority of Multi-Head Attention in In-Context Linear Regression

Open in new window