preference
Preference learning along multiple criteria: A game-theoretic perspective
The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well-known that any Nash equilibrium of the zero-sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, however, are inevitably multi-criteria, with different pairwise preferences governing the different criteria. In this work, we generalize the notion of a von Neumann winner to the multi-criteria setting by taking inspiration from Blackwell's approachability. Our framework allows for non-linear aggregation of preferences across criteria, and generalizes the linearization-based approach from multi-objective optimization. From a theoretical standpoint, we show that the Blackwell winner of a multi-criteria problem instance can be computed as the solution to a convex optimization problem. Furthermore, given random samples of pairwise comparisons, we show that a simple, plug-in estimator achieves (near-)optimal minimax sample complexity. Finally, we showcase the practical utility of our framework in a user study on autonomous driving, where we find that the Blackwell winner outperforms the von Neumann winner for the overall preferences.
Aligning to Thousands of Preferences via System Message Generalization
Although humans inherently have diverse values, current large language model (LLM) alignment methods often assume that aligning LLMs with the general public's preferences is optimal. A major challenge in adopting a more individualized approach to LLM alignment is its lack of scalability, as it involves repeatedly acquiring preference data and training new reward models and LLMs for each individual's preferences. To address these challenges, we propose a new paradigm where users specify what they value most within the system message, steering the LLM's generation behavior to better align with the user's intentions. However, a naive application of such an approach is non-trivial since LLMs are typically trained on a uniform system message (e.g., "You are a helpful assistant"), which limitstheir ability to generalize to diverse, unseen system messages. To improve this generalization, we create Multifaceted Collection, augmenting 66k user instructions into 197k system messages through hierarchical user value combinations.
Reviews: Preference Based Adaptation for Learning Objectives
Summary: The authors consider the problem of optimizing the linear combination of multiple objective functions, where these objective functions are typically surrogate loss functions for machine learning tasks. In the problem setting, the decision maker explore-while-exploit the linear combination in a duel bandit setting, where in each time step the decision maker tests the two hypotheses generated from two linear combinations, and then the decision maker would receive the feedback on whether the first hypothesis is better or the second is better. The main contributions of the paper is the proposal of online algorithms for the duel bandit problem, where the preference on two tested hypotheses is modeled by a binary logistic choice model. In order to avoid retraining the hypothesis for every different linear combination, the authors adapt the boosting algorithm, which focuses on optimizing the mixture of K different hypotheses, where each hypothesis stem from optimizing one surrogate function. Major Comment: I find the paper quite interesting in terms of problem model and the analysis, and I am more inclined towards acceptance than rejection.
Expert Systems: Techniques, Tools, and Applications
The book is edited by Philip Klahr and the late Donald A. Waterman, both of Rand Corporation. The papers are selected from RAND technical reports published from 1977 to 1985. The book is most valuable to people learning knowledge engineering. Four of the papers provide interesting glimpses at the problems involved in transforming knowledge about a domain into computer representations. In addition, the book contains one or two interesting papers for researchers in each of the areas of knowledge acquisition, reasoning with uncertainty, and distributed problem solving.
Personalized Electronic Program Guides for Digital TV
Although today's world offers us unprecedented access to greater and greater amounts of electronic information, we are faced with significant problems when it comes to finding the right information at the right time--the essence of the information-overload problem. One of the proposed solutions to this problem is to develop technologies for automatically learning about the implicit and explicit preferences of individual users to customize and personalize the search for relevant information. For example, modern search engines provide only a first cut through the information space, leaving the user with a significant search task to locate individual information items. This information overload is beginning to cause problems on the internet and is seen as a serious barrier to its future success. This problem takes on even more significance when one considers the new generation of mobile phones, which offer users an alternative internet access route through the wireless application protocol (WAP).
Expert Systems: Techniques, Tools, and Applications
The book is edited by Philip Klahr and the late Donald A. Waterman, both of Rand Corporation. The papers are selected from RAND technical reports published from 1977 to 1985. The book is most valuable to people learning knowledge engineering. Four of the papers provide interesting glimpses at the problems involved in transforming knowledge about a domain into computer representations. In addition, the book contains one or two interesting papers for researchers in each of the areas of knowledge acquisition, reasoning with uncertainty, and distributed problem solving.
Logical and Decision-Theoretic Methods for Planning under Uncertainty
Decision theory and nonmonotonic logics are formalisms that can be employed to represent and solve problems of planning under uncertainty. We analyze the usefulness of these two approaches by establishing a simple correspondence between the two formalisms. The analysis indicates that planning using nonmonotonic logic comprises two decision-theoretic concepts: probabilities (degrees of belief in planning hypotheses) and utilities (degrees of preference for planning outcomes). We present and discuss examples of the following lessons from this decision-theoretic view of nonmonotonic reasoning: (1) decision theory and nonmonotonic logics are intended to solve different components of the planning problem; (2) when considered in the context of planning under uncertainty, nonmonotonic logics do not retain the domain-independent characteristics of classical (monotonic) logic; and (3) because certain nonmonotonic programming paradigms (for example, frame-based inheritance, nonmonotonic logics) are inherently problem specific, they might be inappropriate for use in solving certain types of planning problems. We discuss how these conclusions affect several current AI research issues.
Making Better Recommendations with Online Profiling Agents
In recent years, we have witnessed the success of autonomous agents applying machine-learning techniques across a wide range of applications. However, agents applying the same machine-learning techniques in online applications have not been so successful. Even agent-based hybrid recommender systems that combine information filtering techniques with collaborative filtering techniques have been applied with considerable success only to simple consumer goods such as movies, books, clothing, and food. Yet complex, adaptive autonomous agent systems that can handle complex goods such as real estate, vacation plans, insurance, mutual funds, and mortgages have emerged. To a large extent, the reinforcement learning methods developed to aid agents in learning have been more successfully deployed in offline applications.
- Information Technology (1.00)
- Banking & Finance > Real Estate (1.00)
Representing and Reasoning with Preferences
I consider how to represent and reason with users' preferences. While areas of economics like social choice and game theory have traditionally considered such topics, I will argue that computer science and artificial intelligence bring some fresh perspectives to the study of representing and reasoning with preferences. For instance, I consider how we can elicit preferences efficiently and effectively. With one agent, the agent's desired goal may not be feasible. The agent wants a cheap, low-mileage Ferrari, but no such car exists.
User-Involved Preference Elicitation for Product Search and Recommender Systems
As such systems must crucially rely on an accurate and complete model of user preferences, the acquisition of this model becomes the central subject of this article. Many tools used today do not satisfactorily assist users to establish this model because they do not adequately focus on fundamental decision objectives, help them reveal hidden preferences, revise conflicting preferences, or explicitly reason about tradeoffs. As a result, users fail to find the outcomes that best satisfy their needs and preferences. In this article, we provide some analyses of common areas of design pitfalls and derive a set of design guidelines that assist the user in avoiding these problems in three important areas: user preference elicitation, preference revision, and explanation interfaces. For each area, we describe the state of the art of the developed techniques and discuss concrete scenarios where they have been applied and tested. However, automated decision systems cannot effectively search the space of possible solutions without an accurate model of a user's preferences. Preference acquisition is therefore a fundamental problem of growing importance. Without an adequate interaction model and system guidance, it is difficult for users to establish a complete and accurate model of their preferences. More specifically, we face the following difficulties: First, inadequate elicitation tools can easily mislead users to focus on means objectives rather than fundamental decision objectives and force them to state preferences in the wrong order. For example, a user who commits to the choice of minivans (means objective) for spacious baggage space (fundamental) is not focusing on the values and could risk missing alternatives offered by station wagons. In value-focus thinking, Keeney (1992) suggests that the specification and clarification of values should not be overtaken by the set of alternatives too rapidly. This theory has a direct implication on the order in which the system initially elicits user preferences. Second, users are not aware of all preferences until they see them violated. For example, a user does not think of stating a preference for the intermediate airport until a solution proposes an airplane change in a place the user dislikes. This observation sheds light on the interaction design guideline on how to help users discover their hidden preferences. Finally, preferences can be inconsistent.