Goto

Collaborating Authors

 preference


Preference learning along multiple criteria: A game-theoretic perspective

Neural Information Processing Systems

The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well-known that any Nash equilibrium of the zero-sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, however, are inevitably multi-criteria, with different pairwise preferences governing the different criteria. In this work, we generalize the notion of a von Neumann winner to the multi-criteria setting by taking inspiration from Blackwell's approachability. Our framework allows for non-linear aggregation of preferences across criteria, and generalizes the linearization-based approach from multi-objective optimization.


From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment

Li, Jia-Nan, Guan, Jian, Wu, Songhao, Wu, Wei, Yan, Rui

arXiv.org Artificial Intelligence

Large language models (LLMs) have traditionally been aligned through one-size-fits-all approaches that assume uniform human preferences, fundamentally overlooking the diversity in user values and needs. This paper introduces a comprehensive framework for scalable personalized alignment of LLMs. We establish a systematic preference space characterizing psychological and behavioral dimensions, alongside diverse persona representations for robust preference inference in real-world scenarios. Building upon this foundation, we introduce \textsc{AlignX}, a large-scale dataset of over 1.3 million personalized preference examples, and develop two complementary alignment approaches: \textit{in-context alignment} directly conditioning on persona representations and \textit{preference-bridged alignment} modeling intermediate preference distributions. Extensive experiments demonstrate substantial improvements over existing methods, with an average 17.06\% accuracy gain across four benchmarks while exhibiting a strong adaptation capability to novel preferences, robustness to limited user data, and precise preference controllability. These results validate our framework's effectiveness, advancing toward truly user-adaptive AI systems.


The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents

Tang, Yihong, Chen, Kehai, Bai, Xuefeng, Niu, Zhengyu, Wang, Bo, Liu, Jie, Zhang, Min

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have made remarkable advances in role-playing dialogue agents, demonstrating their utility in character simulations. However, it remains challenging for these agents to balance character portrayal utility with content safety because this essential character simulation often comes with the risk of generating unsafe content. To address this issue, we first conduct a systematic exploration of the safety-utility trade-off across multiple LLMs. Our analysis reveals that risk scenarios created by villain characters and user queries (referred to as risk coupling) contribute to this trade-off. Building on this, we propose a novel Adaptive Dynamic Multi-Preference (ADMP) method, which dynamically adjusts safety-utility preferences based on the degree of risk coupling and guides the model to generate responses biased toward utility or safety. We further introduce Coupling Margin Sampling (CMS) into coupling detection to enhance the model's ability to handle high-risk scenarios. Experimental results demonstrate that our approach improves safety metrics while maintaining utility.


Preference learning made easy: Everything should be understood through win rate

Zhang, Lily H., Ranganath, Rajesh

arXiv.org Machine Learning

Preference learning, or the task of aligning generative models to preference comparison data, has yet to reach the conceptual maturity of classification, density estimation, etc. To close this gap, this work presents a framework to understand preference learning starting from the sampling distribution of pairwise preference data. First, we prove that the only evaluation of a generative model that respects both preferences and prevalences in the data distribution is a form of win rate, justifying win rate as the focal point to understand preference learning. We then analyze preference learning methods as win rate optimization (WRO) or non-WRO. We present novel instances of WRO beyond existing examples (RLHF, NLHF) and identify two key theoretical benefits of all such methods. We prove that common non-WRO methods like DPO and SFT on preferred samples lack these properties and suggest ways to mitigate such theoretical limitations. We also show that WRO underperforms in practice due optimization difficulties and that optimization success predicts performance better than choices which affect the objective's solution. Our analysis highlights best practices for existing methods and provides recommendations for future research, guided by the principle that one should either align non-WRO methods more closely with WRO or improve the optimization of WRO objectives.


What Does it Mean to Give Someone What They Want? The Nature of Preferences in Recommender Systems

#artificialintelligence

A central goal of recommender systems is to select items according to the "preferences" of their users. "Preferences" is a complicated word that has been used across many disciplines to mean, roughly, "what people want." This has been justified by the assumption that people always choose what they want, an idea from 20th-century economics called revealed preference. However, this approach to preferences can lead to a variety of unwanted outcomes including clickbait, addiction, or algorithmic manipulation. Doing better requires both a change in thinking and a change in approach.


Truth and Preferences -- A Game Approach for Qualitative Choice Logic

Freiman, Robert, Bernreiter, Michael

arXiv.org Artificial Intelligence

In this paper, we introduce game-theoretic semantics (GTS) for Qualitative Choice Logic (QCL), which, in order to express preferences, extends classical propositional logic with an additional connective called ordered disjunction. Firstly, we demonstrate that game semantics can capture existing degree-based semantics for QCL in a natural way. Secondly, we show that game semantics can be leveraged to derive new semantics for the language of QCL. In particular, we present a new semantics that makes use of GTS negation and, by doing so, avoids problems with negation in existing QCL-semantics.


Multi-task Learning for Concurrent Prediction of Thermal Comfort, Sensation, and Preference

#artificialintelligence

Therefore, researchers and engineers have proposed numerous computational models to estimate thermal comfort (TC). Given the impetus toward energy efficiency, the current focus is on data-driven TC prediction solutions that leverage state-of-the-art machine learning (ML) algorithms. However, an indoor occupant's perception of indoor thermal comfort (TC) is subjective and multi-dimensional. Different aspects of TC are represented by various standard metrics/scales viz., thermal sensation (TSV), thermal comfort (TCV), and thermal preference (TPV). The current ML-based TC prediction solutions adopt the Single-task Learning approach, i.e., one prediction model per metric. Consequently, solutions often focus on only one TC metric.


Expert Systems: Techniques, Tools, and Applications

AI Magazine

The book is edited by Philip Klahr and the late Donald A. Waterman, both of Rand Corporation. The papers are selected from RAND technical reports published from 1977 to 1985. The book is most valuable to people learning knowledge engineering. Four of the papers provide interesting glimpses at the problems involved in transforming knowledge about a domain into computer representations. In addition, the book contains one or two interesting papers for researchers in each of the areas of knowledge acquisition, reasoning with uncertainty, and distributed problem solving.


Personalized Electronic Program Guides for Digital TV

AI Magazine

Although today's world offers us unprecedented access to greater and greater amounts of electronic information, we are faced with significant problems when it comes to finding the right information at the right time--the essence of the information-overload problem. One of the proposed solutions to this problem is to develop technologies for automatically learning about the implicit and explicit preferences of individual users to customize and personalize the search for relevant information. For example, modern search engines provide only a first cut through the information space, leaving the user with a significant search task to locate individual information items. This information overload is beginning to cause problems on the internet and is seen as a serious barrier to its future success. This problem takes on even more significance when one considers the new generation of mobile phones, which offer users an alternative internet access route through the wireless application protocol (WAP).


Expert Systems: Techniques, Tools, and Applications

AI Magazine

The book is edited by Philip Klahr and the late Donald A. Waterman, both of Rand Corporation. The papers are selected from RAND technical reports published from 1977 to 1985. The book is most valuable to people learning knowledge engineering. Four of the papers provide interesting glimpses at the problems involved in transforming knowledge about a domain into computer representations. In addition, the book contains one or two interesting papers for researchers in each of the areas of knowledge acquisition, reasoning with uncertainty, and distributed problem solving.