preference parameter
Mutual Adaptation in Human-Robot Co-Transportation with Human Preference Uncertainty
Mahmud, Al Jaber, Li, Weizi, Wang, Xuan
Mutual adaptation can significantly enhance overall task performance in human-robot co-transportation by integrating both the robot's and human's understanding of the environment. While human modeling helps capture humans' subjective preferences, two challenges persist: (i) the uncertainty of human preference parameters and (ii) the need to balance adaptation strategies that benefit both humans and robots. In this paper, we propose a unified framework to address these challenges and improve task performance through mutual adaptation. First, instead of relying on fixed parameters, we model a probability distribution of human choices by incorporating a range of uncertain human parameters. Next, we introduce a time-varying stubbornness measure and a coordination mode transition model, which allows either the robot to lead the team's trajectory or, if a human's preferred path conflicts with the robot's plan and their stubbornness exceeds a threshold, the robot to transition to following the human. Finally, we introduce a pose optimization strategy to mitigate the uncertain human behaviors when they are leading. To validate the framework, we design and perform experiments with real human feedback. We then demonstrate, through simulations, the effectiveness of our models in enhancing task performance with mutual adaptation and pose optimization.
Preference-based opponent shaping in differentiable games
Qiao, Xinyu, Hu, Yudong, Han, Congying, Wu, Weiyan, Guo, Tiande
Multi-agent reinforcement learning (MARL), as a theoretical framework for modeling agent behavior in complex game environments, has become a significant area of research [42, 37]. Unlike traditional game theory, MARL typically allows agents to learn strategies through repeated interactions to achieve equilibrium [34]. By relaxing the assumptions of agent rationality and independence, MARL can learn strategies efficiently with arbitrary environments and opponents [10, 20, 17]. Current applications of MARL in game environments are primarily focused on zero-sum games (fully competitive) [10, 41] and fully cooperative games [12, 38], since the behavioral preferences of opponent agents in these environments are relatively easy to predict. Nevertheless, the environments in practical applications, e.g., economic markets, robotics and distributed control, may have multiple equilibrium [16, 40], and opponent agents may not exhibit clear preferences for different strategies, thus agents need to learn strategies in general-sum games [8, 7]. The Prisoner's dilemma [3, 14] is a classic example of the tension between mutual cooperation leading to a win-win situation and focusing solely on self-interest leading to a lose-lose situation. Therefore, modeling and shaping the behavior of opponent agents is the main challenge for the application of MARL in these environments [11]. Recent advancements in MARL have introduced opponent modeling and shaping techniques that allow agents to learn not just their own strategies, but also to predict and influence the strategies of the opponent, such as [10, 20, 36]. These methods show promise in improving the efficiency of strategy learning by incorporating the behavior of other agents into the learning process.
The Nah Bandit: Modeling User Non-compliance in Recommendation Systems
Zhou, Tianyue, Cho, Jung-Hoon, Wu, Cathy
Recommendation systems now pervade the digital world, ranging from advertising to entertainment. However, it remains challenging to implement effective recommendation systems in the physical world, such as in mobility or health. This work focuses on a key challenge: in the physical world, it is often easy for the user to opt out of taking any recommendation if they are not to her liking, and to fall back to her baseline behavior. It is thus crucial in cyber-physical recommendation systems to operate with an interaction model that is aware of such user behavior, lest the user abandon the recommendations altogether. This paper thus introduces the Nah Bandit, a tongue-in-cheek reference to describe a Bandit problem where users can say `nah' to the recommendation and opt for their preferred option instead. As such, this problem lies in between a typical bandit setup and supervised learning. We model the user non-compliance by parameterizing an anchoring effect of recommendations on users. We then propose the Expert with Clustering (EWC) algorithm, a hierarchical approach that incorporates feedback from both recommended and non-recommended options to accelerate user preference learning. In a recommendation scenario with $N$ users, $T$ rounds per user, and $K$ clusters, EWC achieves a regret bound of $O(N\sqrt{T\log K} + NT)$, achieving superior theoretical performance in the short term compared to LinUCB algorithm. Experimental results also highlight that EWC outperforms both supervised learning and traditional contextual bandit approaches. This advancement reveals that effective use of non-compliance feedback can accelerate preference learning and improve recommendation accuracy. This work lays the foundation for future research in Nah Bandit, providing a robust framework for more effective recommendation systems.
Context-aware Bayesian Mixed Multinomial Logit Model
ลukawska, Mirosลawa, Jensen, Anders Fjendbo, Rodrigues, Filipe
The mixed multinomial logit model assumes constant preference parameters of a decision-maker throughout different choice situations, which may be considered too strong for certain choice modelling applications. This paper proposes an effective approach to model context-dependent intra-respondent heterogeneity, thereby introducing the concept of the Context-aware Bayesian mixed multinomial logit model, where a neural network maps contextual information to interpretable shifts in the preference parameters of each individual in each choice occasion. The proposed model offers several key advantages. First, it supports both continuous and discrete variables, as well as complex non-linear interactions between both types of variables. Secondly, each context specification is considered jointly as a whole by the neural network rather than each variable being considered independently. Finally, since the neural network parameters are shared across all decision-makers, it can leverage information from other decision-makers to infer the effect of a particular context on a particular decision-maker. Even though the context-aware Bayesian mixed multinomial logit model allows for flexible interactions between attributes, the increase in computational complexity is minor, compared to the mixed multinomial logit model. We illustrate the concept and interpretation of the proposed model in a simulation study. We furthermore present a real-world case study from the travel behaviour domain - a bicycle route choice model, based on a large-scale, crowdsourced dataset of GPS trajectories including 119,448 trips made by 8,555 cyclists.
Benabbou
In this paper, we develop a general interactive method to solve multi-objective combinatorial optimization problems with imprecise preferences. Assuming that preferences can be represented by a parameterized scalarizing function, we iteratively ask preferences queries to the decision maker in order to reduce the uncertainty over the preference parameters until being able to determine her preferred solution. To produce informative preference queries at each step, we generate promising solutions using the extreme points of the polyhedron representing the admissible preference parameters and then we ask the decision maker to compare two of these solutions (we propose different selection strategies). These extreme points are also used to provide a stopping criterion guaranteeing that the returned solution is optimal (or near-optimal) according to the decision maker's preferences. For the multi-objective spanning tree problem with a linear aggregation function, we provide numerical results to demonstrate the practical efficiency of our approach and we compare our results to a recent approach based on minimax regret, where preferences are asked during the construction of a solution. We show that better results are achieved by our method both in terms of running time and number of questions.
A Gaussian Process Model of Cross-Category Dynamics in Brand Choice
Understanding individual customers' sensitivities to prices, promotions, brand, and other aspects of the marketing mix is fundamental to a wide swath of marketing problems, including targeting and pricing. Companies that operate across many product categories have a unique opportunity, insofar as they can use purchasing data from one category to augment their insights in another. Such cross-category insights are especially crucial in situations where purchasing data may be rich in one category, and scarce in another. An important aspect of how consumers behave across categories is dynamics: preferences are not stable over time, and changes in individual-level preference parameters in one category may be indicative of changes in other categories, especially if those changes are driven by external factors. Yet, despite the rich history of modeling cross-category preferences, the marketing literature lacks a framework that flexibly accounts for \textit{correlated dynamics}, or the cross-category interlinkages of individual-level sensitivity dynamics. In this work, we propose such a framework, leveraging individual-level, latent, multi-output Gaussian processes to build a nonparametric Bayesian choice model that allows information sharing of preference parameters across customers, time, and categories. We apply our model to grocery purchase data, and show that our model detects interesting dynamics of customers' price sensitivities across multiple categories. Managerially, we show that capturing correlated dynamics yields substantial predictive gains, relative to benchmarks. Moreover, we find that capturing correlated dynamics can have implications for understanding changes in consumers preferences over time, and developing targeted marketing strategies based on those dynamics.
Near-Optimal MNL Bandits Under Risk Criteria
Xi, Guangyu, Tao, Chao, Zhou, Yuan
We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk criteria are more general goals widely used in industries and bussiness. We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and prove that they suffer a near-optimal regret. As a complement, we also conduct experiments with both synthetic and real data to show the empirical performance of our proposed algorithms.
Bayesian Automatic Relevance Determination for Utility Function Specification in Discrete Choice Models
Rodrigues, Filipe, Ortelli, Nicola, Bierlaire, Michel, Pereira, Francisco
Specifying utility functions is a key step towards applying the discrete choice framework for understanding the behaviour processes that govern user choices. However, identifying the utility function specifications that best model and explain the observed choices can be a very challenging and time-consuming task. This paper seeks to help modellers by leveraging the Bayesian framework and the concept of automatic relevance determination (ARD), in order to automatically determine an optimal utility function specification from an exponentially large set of possible specifications in a purely data-driven manner. Based on recent advances in approximate Bayesian inference, a doubly stochastic variational inference is developed, which allows the proposed DCM-ARD model to scale to very large and high-dimensional datasets. Using semi-artificial choice data, the proposed approach is shown to very accurately recover the true utility function specifications that govern the observed choices. Moreover, when applied to real choice data, DCM-ARD is shown to be able discover high quality specifications that can outperform previous ones from the literature according to multiple criteria, thereby demonstrating its practical applicability.
Privacy-preserving Crowd-guided AI Decision-making in Ethical Dilemmas
Wang, Teng, Zhao, Jun, Yu, Han, Liu, Jinyan, Yang, Xinyu, Ren, Xuebin, Shi, Shuyu
With the rapid development of artificial intelligence (AI), ethical issues surrounding AI have attracted increasing attention. In particular, autonomous vehicles may face moral dilemmas in accident scenarios, such as staying the course resulting in hurting pedestrians or swerving leading to hurting passengers. To investigate such ethical dilemmas, recent studies have adopted preference aggregation, in which each voter expresses her/his preferences over decisions for the possible ethical dilemma scenarios, and a centralized system aggregates these preferences to obtain the winning decision. Although a useful methodology for building ethical AI systems, such an approach can potentially violate the privacy of voters since moral preferences are sensitive information and their disclosure can be exploited by malicious parties. In this paper, we report a first-of-its-kind privacy-preserving crowd-guided AI decision-making approach in ethical dilemmas. We adopt the notion of differential privacy to quantify privacy and consider four granularities of privacy protection by taking voter-/record-level privacy protection and centralized/distributed perturbation into account, resulting in four approaches VLCP, RLCP, VLDP, and RLDP. Moreover, we propose different algorithms to achieve these privacy protection granularities, while retaining the accuracy of the learned moral preference model. Specifically, VLCP and RLCP are implemented with the data aggregator setting a universal privacy parameter and perturbing the averaged moral preference to protect the privacy of voters' data. VLDP and RLDP are implemented in such a way that each voter perturbs her/his local moral preference with a personalized privacy parameter. Extensive experiments on both synthetic and real data demonstrate that the proposed approach can achieve high accuracy of preference aggregation while protecting individual voter's privacy.
Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
Wang, Yining, Chen, Xi, Zhou, Yuan
In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.