Semi-Supervised In-Context Learning: A Baseline Study
Gu, Zhengyao, Zou, Henry Peng, Chen, Yankai, Liu, Aiwei, Zhang, Weizhi, Yu, Philip S.
–arXiv.org Artificial Intelligence
Most existing work in data selection for In-Context Learning (ICL) has focused on constructing demonstrations from ground truth annotations, with limited attention given to selecting reliable self-generated annotations. In this work, we propose a three-step semi-supervised ICL framework: annotation generation, demonstration selection, and semi-supervised inference. Our baseline, Naive-SemiICL, which prompts select high-confidence self-generated demonstrations for ICL prompting, outperforms a 16-shot baseline by an average of 9.94% across 16 datasets. We further introduce IterPSD, an annotation approach that refines pseudo-demonstrations iteratively, achieving up to 6.8% additional gains in classification tasks. Lastly, we reveal a scaling law for semi-supervised ICL, where models achieve optimal performance with over 1,000 demonstrations.
arXiv.org Artificial Intelligence
Mar-4-2025
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Illinois > Cook County
- Chicago (0.04)
- New York > New York County
- Europe > Portugal
- Asia
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Technology: