AITopics | preference bias

Collaborating Authors

preference bias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks

Wang, Hao, Pan, Licheng, Chen, Zhichao, Zheng, Chunyuan, Chu, Zhixuan, Li, Xiaoxi, Lu, Yuan, Liu, Xinggao, Li, Haoxuan, Lin, Zhouchen

arXiv.org Machine LearningMar-20-2026

Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models, current reward modeling heavily relies on experimental feedback data collected from human annotators under controlled and costly conditions. In this work, we introduce observational reward modeling -- learning reward models with observational user feedback (e.g., clicks, copies, and upvotes) -- as a scalable and cost-effective alternative. We identify two fundamental challenges in this setting: (1) observational feedback is noisy due to annotation errors, which deviates it from true user preference; (2) observational feedback is biased by user preference, where users preferentially provide feedback on responses they feel strongly about, which creats a distribution shift between training and inference data. To address these challenges, we propose CausalRM, a causal-theoretic reward modeling framework that aims to learn unbiased reward models from observational feedback. To tackle challenge (1), CausalRM introduces a noise-aware surrogate loss term that is provably equivalent to the primal loss under noise-free conditions by explicitly modeling the annotation error generation process. To tackle challenge (2), CausalRM uses propensity scores -- the probability of a user providing feedback for a given response -- to reweight training samples, yielding a loss function that eliminates user preference bias. Extensive experiments across diverse LLM backbones and benchmark datasets validate that CausalRM effectively learns accurate reward signals from noisy and biased observational feedback and delivers substantial performance improvements on downstream RLHF tasks -- including a 49.2% gain on WildGuardMix and a 32.7% improvement on HarmBench. Code is available on our project website.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2603.18736

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.50)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mitigating Judgment Preference Bias in Large Language Models through Group-Based Polling

Liu, Shuliang, Xu, Zhipeng, Liu, Zhenghao, Yan, Yukun, Yu, Minghe, Gu, Yu, Chen, Chong, Xie, Huiyuan, Yu, Ge

arXiv.org Artificial IntelligenceOct-10-2025

Large Language Models (LLMs) as automatic evaluators, commonly referred to as LLM-as-a-Judge, have also attracted growing attention. This approach plays a vital role in aligning LLMs with human judgments, providing accurate and reliable assessments. However, LLM-based judgment models often exhibit judgment preference bias during the evaluation phase, tending to favor responses generated by themselves, undermining the reliability of their judgments. This paper introduces the Group-Based Polling Optimization (Genii), an unsupervised multi-agent collaborative optimization framework that mitigates the inherent judgment preference bias of judgment models. Specifically, Genii integrates various LLM-based judgment models into a multi-agent system and simulates the interactive client-server polling mechanism to optimize each client agent unsupervisedly. Our experiments demonstrate that Genii outperforms supervised models trained on annotated judgment data, while requiring no human-labeled annotations. Genii consistently improves performance across different client agents during the polling, even when weaker models act as server agents. Further analysis reveals that Genii effectively mitigates judgment preference bias of LLM-based judgment models, demonstrating its effectiveness. All codes are available at https://github.com/NEUIR/Genii.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.08145

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

PRINCIPLES: Synthetic Strategy Memory for Proactive Dialogue Agents

Kim, Namyoung, Ong, Kai Tzu-iunn, Hwang, Yeonjun, Kang, Minseok, Jihn, Iiseo, Kim, Gayoung, Kim, Minju, Yeo, Jinyoung

arXiv.org Artificial IntelligenceSep-23-2025

Dialogue agents based on large language models (LLMs) have shown promising performance in proactive dialogue, which requires effective strategy planning. However, existing approaches to strategy planning for proactive dialogue face several limitations: limited strategy coverage, preference bias in planning, and reliance on costly additional training. To address these, we propose PRINCIPLES: a synthetic strategy memory for proactive dialogue agents. PRINCIPLES is derived through offline self-play simulations and serves as reusable knowledge that guides strategy planning during inference, eliminating the need for additional training and data annotation. We evaluate PRINCIPLES in both emotional support and persuasion domains, demonstrating consistent improvements over strong baselines. Furthermore, PRINCIPLES maintains its robustness across extended and more diverse evaluation settings. See our project page at https://huggingface.co/spaces/kimnamssya/Principles.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.17459

Country:

North America (0.46)
Asia (0.28)

Genre:

Research Report (1.00)
Personal > Interview (0.46)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Mitigating Strategy Preference Bias in Emotional Support Conversation via Uncertainty Estimations

Zhou, Yougen, Chen, Qin, Zhou, Ningning, Zhou, Jie, Wu, Xingjiao, He, Liang

arXiv.org Artificial IntelligenceSep-17-2025

Emotional support conversation (ESC) aims to alleviate distress through empathetic dialogue, yet large language models (LLMs) face persistent challenges in delivering effective ESC due to low accuracy in strategy planning. Moreover, there is a considerable preference bias towards specific strategies. Prior methods using fine-tuned strategy planners have shown potential in reducing such bias, while the underlying causes of the preference bias in LLMs have not well been studied. To address these issues, we first reveal the fundamental causes of the bias by identifying the knowledge boundaries of LLMs in strategy planning. Then, we propose an approach to mitigate the bias by reinforcement learning with a dual reward function, which optimizes strategy planning via both accuracy and entropy-based confidence for each region according to the knowledge boundaries. Experiments on the ESCov and ExTES datasets with multiple LLM backbones show that our approach outperforms the baselines, confirming the effectiveness of our approach.

computational linguistic, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2509.12661

Country:

Europe > Austria (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control

Xiao, Yuxin, Wan, Chaoqun, Zhang, Yonggang, Wang, Wenxiao, Lin, Binbin, He, Xiaofei, Shen, Xu, Ye, Jieping

arXiv.org Artificial IntelligenceNov-4-2024

As the development and application of Large Language Models (LLMs) continue to advance rapidly, enhancing their trustworthiness and aligning them with human preferences has become a critical area of research. Traditional methods rely heavily on extensive data for Reinforcement Learning from Human Feedback (RLHF), but representation engineering offers a new, training-free approach. This technique leverages semantic features to control the representation of LLM's intermediate hidden states, enabling the model to meet specific requirements such as increased honesty or heightened safety awareness. However, a significant challenge arises when attempting to fulfill multiple requirements simultaneously. It proves difficult to encode various semantic contents, like honesty and safety, into a singular semantic feature, restricting its practicality. In this work, we address this issue through ``Sparse Activation Control''. By delving into the intrinsic mechanisms of LLMs, we manage to identify and pinpoint components that are closely related to specific tasks within the model, i.e., attention heads. These heads display sparse characteristics that allow for near-independent control over different tasks. Our experiments, conducted on the open-source Llama series models, have yielded encouraging results. The models were able to align with human preferences on issues of safety, factuality, and bias concurrently.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.02461

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(16 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

Kang, Dongjin, Kim, Sunghwan, Kwon, Taeyoon, Moon, Seungjun, Cho, Hyunsouk, Yu, Youngjae, Lee, Dongha, Yeo, Jinyoung

arXiv.org Artificial IntelligenceJun-5-2024

Emotional Support Conversation (ESC) is a task aimed at alleviating individuals' emotional distress through daily conversation. Given its inherent complexity and non-intuitive nature, ESConv dataset incorporates support strategies to facilitate the generation of appropriate responses. Recently, despite the remarkable conversational ability of large language models (LLMs), previous studies have suggested that they often struggle with providing useful emotional support. Hence, this work initially analyzes the results of LLMs on ESConv, revealing challenges in selecting the correct strategy and a notable preference for a specific strategy. Motivated by these, we explore the impact of the inherent preference in LLMs on providing emotional support, and consequently, we observe that exhibiting high preference for specific strategies hinders effective emotional support, aggravating its robustness in predicting the appropriate strategy. Moreover, we conduct a methodological study to offer insights into the necessary approaches for LLMs to serve as proficient emotional supporters. Our findings emphasize that (1) low preference for specific strategies hinders the progress of emotional support, (2) external assistance helps reduce preference bias, and (3) existing LLMs alone cannot become good emotional supporters. These insights suggest promising avenues for future research to enhance the emotional intelligence of LLMs.

emotional support, llm, preference bias, (17 more...)

arXiv.org Artificial Intelligence

2402.13211

Country:

Asia > Middle East > Israel (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Crank up the volume: preference bias amplification in collaborative recommendation

Lin, Kun, Sonboli, Nasim, Mobasher, Bamshad, Burke, Robin

arXiv.org Machine LearningSep-12-2019

Recommender systems are personalized: we expect the results given to a particular user to reflect that user's preferences. Some researchers have studied the notion of calibration, how well recommendations match users' stated preferences, and bias disparity the extent to which mis-calibration affects different user groups. In this paper, we examine bias disparity over a range of different algorithms and for different item categories and demonstrate significant differences between model-based and memory-based algorithms.

algorithm, bias disparity, preference ratio, (14 more...)

arXiv.org Machine Learning

1909.06362

Country:

Europe > Denmark > Capital Region > Copenhagen (0.06)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Law (1.00)
Media > Film (0.94)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback