FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation

Li, Bangzheng, Zhou, Ben, Fu, Xingyu, Wang, Fei, Roth, Dan, Chen, Muhao

Jun-17-2024–arXiv.org Artificial Intelligence

Language models have shown impressive in-context-learning capabilities, which allow them to benefit from input prompts and perform better on downstream end tasks. Existing works investigate the mechanisms behind this observation, and propose label-agnostic prompt metrics that can better estimate end-task performances. One popular approach is using perplexity as a way to measure models' familiarity with the prompt. While showing consistent improvements on in-domain tasks, we found that familiarity metrics such as perplexity cannot accurately estimate performance in complicated situations such as task or domain transferring scenarios. In this work, we propose a revised measure called FamiCom, providing a more comprehensive measure for task-agnostic performance estimation. Specifically, FamiCom combines familiarity with \textit{complexity} -- the inherent difficulty of end tasks, which is an important factor missing from current metrics. Experiments show that FamiCom strongly correlates with end-task performances, producing a 0.85 Spearman's correlation, versus 0.43 of familiarity-only ones'. We further apply FamiCom to automatic prompt and demonstration selection, and outperform existing methods and baselines by more than 7.0% in accuracy.

computational linguistic, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jun-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.28)
  - Washington > King County
    - Seattle (0.14)

Genre:
- Research Report
  - Experimental Study (0.46)
  - New Finding (0.68)

Industry:
- Education > Curriculum > Subject-Specific Education (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found