risk preference
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context
When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in supporting decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potential biases. Although several empirical studies have investigated the rationality and social behavior performance of LLMs, their internal decision-making tendencies and capabilities remain inadequately understood. This paper proposes a framework, grounded in behavioral economics theories, to evaluate the decision-making behaviors of LLMs. With a multiple-choice-list experiment, we initially estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo,
- North America > United States > Illinois (0.04)
- Asia > Vietnam (0.04)
- Asia > India (0.04)
- Asia > China (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > Illinois (0.04)
- Asia > Vietnam (0.04)
- Asia > India (0.04)
- Asia > China (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Can Risk-taking AI-Assistants suitably represent entities
Mazyaki, Ali, Naghizadeh, Mohammad, Zonouzaghi, Samaneh Ranjkhah, Sotoudeh, Amirhossein Farshi
Responsible AI demands systems whose behavioral tendencies can be effectively measured, audited, and adjusted to prevent inadvertently nudging users toward risky decisions or embedding hidden biases in risk aversion. As language models (LMs) are increasingly incorporated into AI-driven decision support systems, understanding their risk behaviors is crucial for their responsible deployment. This study investigates the manipulability of risk aversion (MoRA) in LMs, examining their ability to replicate human risk preferences across diverse economic scenarios, with a focus on gender-specific attitudes, uncertainty, role-based decision-making, and the manipulability of risk aversion. The results indicate that while LMs such as DeepSeek Reasoner and Gemini-2.0-flash-lite exhibit some alignment with human behaviors, notable discrepancies highlight the need to refine bio-centric measures of manipulability. These findings suggest directions for refining AI design to better align human and AI risk preferences and enhance ethical decision-making. The study calls for further advancements in model design to ensure that AI systems more accurately replicate human risk preferences, thereby improving their effectiveness in risk management contexts. This approach could enhance the applicability of AI assistants in managing risk.
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Europe (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- (4 more...)
Risk Profiling and Modulation for LLMs
Wang, Yikai, Li, Xiaocheng, Chen, Guanting
Large language models (LLMs) are increasingly used for decision-making tasks under uncertainty; however, their risk profiles and how they are influenced by prompting and alignment methods remain underexplored. Existing studies have primarily examined personality prompting or multi-agent interactions, leaving open the question of how post-training influences the risk behavior of LLMs. In this work, we propose a new pipeline for eliciting, steering, and modulating LLMs' risk profiles, drawing on tools from behavioral economics and finance. Using utility-theoretic models, we compare pre-trained, instruction-tuned, and RLHF-aligned LLMs, and find that while instruction-tuned models exhibit behaviors consistent with some standard utility formulations, pre-trained and RLHF-aligned models deviate more from any utility models fitted. We further evaluate modulation strategies, including prompt engineering, in-context learning, and post-training, and show that post-training provides the most stable and effective modulation of risk preference. Our findings provide insights into the risk profiles of different classes and stages of LLMs and demonstrate how post-training modulates these profiles, laying the groundwork for future research on behavioral alignment and risk-aware LLM design.
- Health & Medicine (1.00)
- Banking & Finance (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Downside Risk-Aware Equilibria for Strategic Decision-Making
Slumbers, Oliver, Evans, Benjamin Patrick, Ganesh, Sumitra, Ardon, Leo
Game theory has traditionally had a relatively limited view of risk based on how a player's expected reward is impacted by the uncertainty of the actions of other players. Recently, a new game-theoretic approach provides a more holistic view of risk also considering the reward-variance. However, these variance-based approaches measure variance of the reward on both the upside and downside. In many domains, such as finance, downside risk only is of key importance, as this represents the potential losses associated with a decision. In contrast, large upside "risk" (e.g. profits) are not an issue. To address this restrictive view of risk, we propose a novel solution concept, downside risk aware equilibria (DRAE) based on lower partial moments. DRAE restricts downside risk, while placing no restrictions on upside risk, and additionally, models higher-order risk preferences. We demonstrate the applicability of DRAE on several games, successfully finding equilibria which balance downside risk with expected reward, and prove the existence and optimality of this equilibria.
- Europe > United Kingdom > England > Greater London > London (0.04)
- North America > United States > New York (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Banking & Finance (1.00)
- Leisure & Entertainment > Games (0.67)
Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Wang, Rui, Lin, Qihan, Liu, Jiayu, Zong, Qing, Zheng, Tianshi, Wang, Weiqi, Song, Yangqiu
Prospect Theory (PT) models human decision-making under uncertainty, while epistemic markers (e.g., maybe) serve to express uncertainty in language. However, it remains largely unexplored whether Prospect Theory applies to contemporary Large Language Models and whether epistemic markers, which express human uncertainty, affect their decision-making behaviour. To address these research gaps, we design a three-stage experiment based on economic questionnaires. We propose a more general and precise evaluation framework to model LLMs' decision-making behaviour under PT, introducing uncertainty through the empirical probability values associated with commonly used epistemic markers in comparable contexts. We then incorporate epistemic markers into the evaluation framework based on their corresponding probability values to examine their influence on LLM decision-making behaviours. Our findings suggest that modelling LLMs' decision-making with PT is not consistently reliable, particularly when uncertainty is expressed in diverse linguistic forms. Our code is released in https://github.com/HKUST-KnowComp/MarPT.
- Europe > Austria > Vienna (0.14)
- Asia > China > Hong Kong (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (3 more...)
Risk-sensitive Actor-Critic with Static Spectral Risk Measures for Online and Offline Reinforcement Learning
The development of Distributional Reinforcement Learning (DRL) has introduced a natural way to incorporate risk sensitivity into value-based and actor-critic methods by employing risk measures other than expectation in the value function. While this approach is widely adopted in many online and offline RL algorithms due to its simplicity, the naive integration of risk measures often results in suboptimal policies. This limitation can be particularly harmful in scenarios where the need for effective risk-sensitive policies is critical and worst-case outcomes carry severe consequences. To address this challenge, we propose a novel framework for optimizing static Spectral Risk Measures (SRM), a flexible family of risk measures that generalizes objectives such as CVaR and Mean-CVaR, and enables the tailoring of risk preferences. Our method is applicable to both online and offline RL algorithms. We establish theoretical guarantees by proving convergence in the finite state-action setting. Moreover, through extensive empirical evaluations, we demonstrate that our algorithms consistently outperform existing risk-sensitive methods in both online and offline environments across diverse domains.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (3 more...)
- Banking & Finance > Trading (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.96)
- Health & Medicine > Therapeutic Area > Immunology > HIV (0.47)
Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study
Song, Bing, Liu, Jianing, Jian, Sisi, Wu, Chenyang, Dixit, Vinayak
Large language models (LLMs) have made significant strides, extending their applications to dialogue systems, automated content creation, and domain-specific advisory tasks. However, as their use grows, concerns have emerged regarding their reliability in simulating complex decision-making behavior, such as risky decision-making, where a single choice can lead to multiple outcomes. This study investigates the ability of LLMs to simulate risky decision-making scenarios. We compare model-generated decisions with actual human responses in a series of lottery-based tasks, using transportation stated preference survey data from participants in Sydney, Dhaka, Hong Kong, and Nanjing. Demographic inputs were provided to two LLMs -- ChatGPT 4o and ChatGPT o1-mini -- which were tasked with predicting individual choices. Risk preferences were analyzed using the Constant Relative Risk Aversion (CRRA) framework. Results show that both models exhibit more risk-averse behavior than human participants, with o1-mini aligning more closely with observed human decisions. Further analysis of multilingual data from Nanjing and Hong Kong indicates that model predictions in Chinese deviate more from actual responses compared to English, suggesting that prompt language may influence simulation performance. These findings highlight both the promise and the current limitations of LLMs in replicating human-like risk behavior, particularly in linguistic and cultural settings.
- Asia > China > Jiangsu Province > Nanjing (0.46)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.26)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Law (0.93)
- Government (0.68)
- Banking & Finance > Financial Services (0.46)
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context
When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in supporting decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potential biases. Although several empirical studies have investigated the rationality and social behavior performance of LLMs, their internal decision-making tendencies and capabilities remain inadequately understood. This paper proposes a framework, grounded in behavioral economics theories, to evaluate the decision-making behaviors of LLMs. With a multiple-choice-list experiment, we initially estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo,