correct option
- Law (0.92)
- Information Technology (0.92)
- Government (0.67)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Appendix A
Q: For what purpose was the dataset created? Q: Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Q: Who funded the creation of the dataset? Q: What do the instances that comprise the dataset represent (e.g., documents, photos, people, Q: How many instances are there in total (of each type, if appropriate)? As shown in Table 1, the dataset statistics are as follows: Grounding Task: 111,770 samples for training, 21,616 samples for testing. For grounding, we use only one annotation per image.
Metric-Fair Prompting: Treating Similar Samples Similarly
Wang, Jing, Shen, Jie, Niu, Xing, Zhang, Tong, Weiss, Jeremy
We introduce \emph{Metric-Fair Prompting}, a fairness-aware prompting framework that guides large language models (LLMs) to make decisions under metric-fairness constraints. In the application of multiple-choice medical question answering, each {(question, option)} pair is treated as a binary instance with label $+1$ (correct) or $-1$ (incorrect). To promote {individual fairness}~--~treating similar instances similarly~--~we compute question similarity using NLP embeddings and solve items in \emph{joint pairs of similar questions} rather than in isolation. The prompt enforces a global decision protocol: extract decisive clinical features, map each \((\text{question}, \text{option})\) to a score $f(x)$ that acts as confidence, and impose a Lipschitz-style constraint so that similar inputs receive similar scores and, hence, consistent outputs. Evaluated on the {MedQA (US)} benchmark, Metric-Fair Prompting is shown to improve performance over standard single-item prompting, demonstrating that fairness-guided, confidence-oriented reasoning can enhance LLM accuracy on high-stakes clinical multiple-choice questions.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Education (1.00)
Appendix A
Q: For what purpose was the dataset created? Q: Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Q: Who funded the creation of the dataset? Q: What do the instances that comprise the dataset represent (e.g., documents, photos, people, Q: How many instances are there in total (of each type, if appropriate)? As shown in Table 1, the dataset statistics are as follows: Grounding Task: 111,770 samples for training, 21,616 samples for testing. For grounding, we use only one annotation per image.
- Law (0.92)
- Information Technology (0.92)
- Government (0.67)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.72)
- Asia > Singapore (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Monaco (0.04)
- (2 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Education (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- (16 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
EtiCor++: Towards Understanding Etiquettical Bias in LLMs
Dwivedi, Ashutosh, Singh, Siddhant Shivdutt, Modi, Ashutosh
In recent years, researchers have started analyzing the cultural sensitivity of LLMs. In this respect, Etiquettes have been an active area of research. Etiquettes are region-specific and are an essential part of the culture of a region; hence, it is imperative to make LLMs sensitive to etiquettes. However, there needs to be more resources in evaluating LLMs for their understanding and bias with regard to etiquettes. In this resource paper, we introduce EtiCor++, a corpus of etiquettes worldwide. We introduce different tasks for evaluating LLMs for knowledge about etiquettes across various regions. Further, we introduce various metrics for measuring bias in LLMs. Extensive experimentation with LLMs shows inherent bias towards certain regions.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > India (0.07)
- Asia > East Asia (0.04)
- (29 more...)