Metric-Fair Prompting: Treating Similar Samples Similarly

Wang, Jing, Shen, Jie, Niu, Xing, Zhang, Tong, Weiss, Jeremy

Dec-9-2025–arXiv.org Artificial Intelligence

We introduce \emph{Metric-Fair Prompting}, a fairness-aware prompting framework that guides large language models (LLMs) to make decisions under metric-fairness constraints. In the application of multiple-choice medical question answering, each {(question, option)} pair is treated as a binary instance with label $+1$ (correct) or $-1$ (incorrect). To promote {individual fairness}~--~treating similar instances similarly~--~we compute question similarity using NLP embeddings and solve items in \emph{joint pairs of similar questions} rather than in isolation. The prompt enforces a global decision protocol: extract decisive clinical features, map each $(\text{question}, \text{option})$ to a score $f(x)$ that acts as confidence, and impose a Lipschitz-style constraint so that similar inputs receive similar scores and, hence, consistent outputs. Evaluated on the {MedQA (US)} benchmark, Metric-Fair Prompting is shown to improve performance over standard single-item prompting, demonstrating that fairness-guided, confidence-oriented reasoning can enhance LLM accuracy on high-stakes clinical multiple-choice questions.

justification, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

Dec-9-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.04)
  - Illinois > Champaign County
    - Urbana (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Education (1.00)
- Government > Regional Government
  - North America Government > United States Government (1.00)
- Health & Medicine > Therapeutic Area (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)