preference
Preference learning along multiple criteria: A game-theoretic perspective
The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well-known that any Nash equilibrium of the zero-sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, however, are inevitably multi-criteria, with different pairwise preferences governing the different criteria. In this work, we generalize the notion of a von Neumann winner to the multi-criteria setting by taking inspiration from Blackwell's approachability. Our framework allows for non-linear aggregation of preferences across criteria, and generalizes the linearization-based approach from multi-objective optimization. From a theoretical standpoint, we show that the Blackwell winner of a multi-criteria problem instance can be computed as the solution to a convex optimization problem. Furthermore, given random samples of pairwise comparisons, we show that a simple, plug-in estimator achieves (near-)optimal minimax sample complexity. Finally, we showcase the practical utility of our framework in a user study on autonomous driving, where we find that the Blackwell winner outperforms the von Neumann winner for the overall preferences.
Aligning to Thousands of Preferences via System Message Generalization
Although humans inherently have diverse values, current large language model (LLM) alignment methods often assume that aligning LLMs with the general public's preferences is optimal. A major challenge in adopting a more individualized approach to LLM alignment is its lack of scalability, as it involves repeatedly acquiring preference data and training new reward models and LLMs for each individual's preferences. To address these challenges, we propose a new paradigm where users specify what they value most within the system message, steering the LLM's generation behavior to better align with the user's intentions. However, a naive application of such an approach is non-trivial since LLMs are typically trained on a uniform system message (e.g., "You are a helpful assistant"), which limitstheir ability to generalize to diverse, unseen system messages. To improve this generalization, we create Multifaceted Collection, augmenting 66k user instructions into 197k system messages through hierarchical user value combinations.
The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents
Tang, Yihong, Chen, Kehai, Bai, Xuefeng, Niu, Zhengyu, Wang, Bo, Liu, Jie, Zhang, Min
Large Language Models (LLMs) have made remarkable advances in role-playing dialogue agents, demonstrating their utility in character simulations. However, it remains challenging for these agents to balance character portrayal utility with content safety because this essential character simulation often comes with the risk of generating unsafe content. To address this issue, we first conduct a systematic exploration of the safety-utility trade-off across multiple LLMs. Our analysis reveals that risk scenarios created by villain characters and user queries (referred to as risk coupling) contribute to this trade-off. Building on this, we propose a novel Adaptive Dynamic Multi-Preference (ADMP) method, which dynamically adjusts safety-utility preferences based on the degree of risk coupling and guides the model to generate responses biased toward utility or safety. We further introduce Coupling Margin Sampling (CMS) into coupling detection to enhance the model's ability to handle high-risk scenarios. Experimental results demonstrate that our approach improves safety metrics while maintaining utility.
- Asia > China (0.67)
- Asia > Thailand (0.15)
- North America > United States (0.14)
- North America > Canada (0.14)
Reviews: Preference Based Adaptation for Learning Objectives
Summary: The authors consider the problem of optimizing the linear combination of multiple objective functions, where these objective functions are typically surrogate loss functions for machine learning tasks. In the problem setting, the decision maker explore-while-exploit the linear combination in a duel bandit setting, where in each time step the decision maker tests the two hypotheses generated from two linear combinations, and then the decision maker would receive the feedback on whether the first hypothesis is better or the second is better. The main contributions of the paper is the proposal of online algorithms for the duel bandit problem, where the preference on two tested hypotheses is modeled by a binary logistic choice model. In order to avoid retraining the hypothesis for every different linear combination, the authors adapt the boosting algorithm, which focuses on optimizing the mixture of K different hypotheses, where each hypothesis stem from optimizing one surrogate function. Major Comment: I find the paper quite interesting in terms of problem model and the analysis, and I am more inclined towards acceptance than rejection.
Multi-task Learning for Concurrent Prediction of Thermal Comfort, Sensation, and Preference
Therefore, researchers and engineers have proposed numerous computational models to estimate thermal comfort (TC). Given the impetus toward energy efficiency, the current focus is on data-driven TC prediction solutions that leverage state-of-the-art machine learning (ML) algorithms. However, an indoor occupant's perception of indoor thermal comfort (TC) is subjective and multi-dimensional. Different aspects of TC are represented by various standard metrics/scales viz., thermal sensation (TSV), thermal comfort (TCV), and thermal preference (TPV). The current ML-based TC prediction solutions adopt the Single-task Learning approach, i.e., one prediction model per metric. Consequently, solutions often focus on only one TC metric.
Expert Systems: Techniques, Tools, and Applications
The book is edited by Philip Klahr and the late Donald A. Waterman, both of Rand Corporation. The papers are selected from RAND technical reports published from 1977 to 1985. The book is most valuable to people learning knowledge engineering. Four of the papers provide interesting glimpses at the problems involved in transforming knowledge about a domain into computer representations. In addition, the book contains one or two interesting papers for researchers in each of the areas of knowledge acquisition, reasoning with uncertainty, and distributed problem solving.
Personalized Electronic Program Guides for Digital TV
Although today's world offers us unprecedented access to greater and greater amounts of electronic information, we are faced with significant problems when it comes to finding the right information at the right time--the essence of the information-overload problem. One of the proposed solutions to this problem is to develop technologies for automatically learning about the implicit and explicit preferences of individual users to customize and personalize the search for relevant information. For example, modern search engines provide only a first cut through the information space, leaving the user with a significant search task to locate individual information items. This information overload is beginning to cause problems on the internet and is seen as a serious barrier to its future success. This problem takes on even more significance when one considers the new generation of mobile phones, which offer users an alternative internet access route through the wireless application protocol (WAP).
Expert Systems: Techniques, Tools, and Applications
The book is edited by Philip Klahr and the late Donald A. Waterman, both of Rand Corporation. The papers are selected from RAND technical reports published from 1977 to 1985. The book is most valuable to people learning knowledge engineering. Four of the papers provide interesting glimpses at the problems involved in transforming knowledge about a domain into computer representations. In addition, the book contains one or two interesting papers for researchers in each of the areas of knowledge acquisition, reasoning with uncertainty, and distributed problem solving.
Logical and Decision-Theoretic Methods for Planning under Uncertainty
Decision theory and nonmonotonic logics are formalisms that can be employed to represent and solve problems of planning under uncertainty. We analyze the usefulness of these two approaches by establishing a simple correspondence between the two formalisms. The analysis indicates that planning using nonmonotonic logic comprises two decision-theoretic concepts: probabilities (degrees of belief in planning hypotheses) and utilities (degrees of preference for planning outcomes). We present and discuss examples of the following lessons from this decision-theoretic view of nonmonotonic reasoning: (1) decision theory and nonmonotonic logics are intended to solve different components of the planning problem; (2) when considered in the context of planning under uncertainty, nonmonotonic logics do not retain the domain-independent characteristics of classical (monotonic) logic; and (3) because certain nonmonotonic programming paradigms (for example, frame-based inheritance, nonmonotonic logics) are inherently problem specific, they might be inappropriate for use in solving certain types of planning problems. We discuss how these conclusions affect several current AI research issues.
Making Better Recommendations with Online Profiling Agents
In recent years, we have witnessed the success of autonomous agents applying machine-learning techniques across a wide range of applications. However, agents applying the same machine-learning techniques in online applications have not been so successful. Even agent-based hybrid recommender systems that combine information filtering techniques with collaborative filtering techniques have been applied with considerable success only to simple consumer goods such as movies, books, clothing, and food. Yet complex, adaptive autonomous agent systems that can handle complex goods such as real estate, vacation plans, insurance, mutual funds, and mortgages have emerged. To a large extent, the reinforcement learning methods developed to aid agents in learning have been more successfully deployed in offline applications.
- Information Technology (1.00)
- Banking & Finance > Real Estate (1.00)