BERTs are Generative In-Context Learners
–Neural Information Processing Systems
While in-context learning is commonly associated with causal language models, such as GPT, we demonstrate that this capability also'emerges' in masked language models. Through an embarrassingly simple inference technique, we enable an existing masked model, DeBERTa, to perform generative tasks without additional training or architectural changes. Our evaluation reveals that the masked and causal language models behave very differently, as they clearly outperform each other on different categories of tasks. These complementary strengths suggest that the field's focus on causal models for in-context learning may be limiting - both architectures can develop these capabilities, but with distinct advantages; pointing toward promising hybrid approaches that combine the strengths of both objectives.
Neural Information Processing Systems
May-28-2025, 07:48:01 GMT
- Country:
- Europe (1.00)
- North America > United States
- Minnesota > Hennepin County > Minneapolis (0.14)
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.93)
- Research Report
- Industry:
- Education > Curriculum > Subject-Specific Education (0.45)
- Technology: