Goto

Collaborating Authors

 risk sensitivity



Two steps to risk sensitivity

Neural Information Processing Systems

Distributional reinforcement learning (RL) - in which agents learn about all the possible long-term consequences of their actions, and not just the expected value - is of great recent interest. One of the most important affordances of a distributional view is facilitating a modern, measured, approach to risk when outcomes are not completely certain. By contrast, psychological and neuroscientific investigations into decision making under risk have utilized a variety of more venerable theoretical models such as prospect theory that lack axiomatically desirable properties such as coherence. Here, we consider a particularly relevant risk measure for modeling human and animal planning, called conditional value-at-risk (CVaR), which quantifies worst-case outcomes (e.g., vehicle accidents or predation). We first adopt a conventional distributional approach to CVaR in a sequential setting and reanalyze the choices of human decision-makers in the well-known two-step task, revealing substantial risk aversion that had been lurking under stickiness and perseveration. We then consider a further critical property of risk sensitivity, namely time consistency, showing alternatives to this form of CVaR that enjoy this desirable characteristic. We use simulations to examine settings in which the various forms differ in ways that have implications for human and animal planning and behavior.


Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Neural Information Processing Systems

We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility. We propose two provably efficient model-free algorithms, Risk-Sensitive V alue Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ). These algorithms implement a form of risk-sensitive optimism in the face of uncertainty, which adapts to both risk-seeking and risk-averse modes of exploration.



Understanding Driver Cognition and Decision-Making Behaviors in High-Risk Scenarios: A Drift Diffusion Perspective

arXiv.org Artificial Intelligence

Ensuring safe interactions between autonomous vehicles (AVs) and human drivers in mixed traffic systems remains a major challenge, particularly in complex, high-risk scenarios. This paper presents a cognition-decision framework that integrates individual variability and commonalities in driver behavior to quantify risk cognition and model dynamic decision-making. First, a risk sensitivity model based on a multivariate Gaussian distribution is developed to characterize individual differences in risk cognition. Then, a cognitive decision-making model based on the drift diffusion model (DDM) is introduced to capture common decision-making mechanisms in highrisk environments. The DDM dynamically adjusts decision thresholds by integrating initial bias, drift rate, and boundary parameters, adapting to variations in speed, relative distance, and risk sensitivity to reflect diverse driving styles and risk preferences. By simulating high-risk scenarios with lateral, longitudinal, and multidimensional risk sources in a driving simulator, the proposed model accurately predicts cognitive responses and decision behaviors during emergency maneuvers. Specifically, by incorporating driver-specific risk sensitivity, the model enables dynamic adjustments of key DDM parameters, allowing for personalized decision-making representations in diverse scenarios. Comparative analysis with IDM, Gipps, and MOBIL demonstrates that DDM more precisely captures human cognitive processes and adaptive decision-making in high-risk scenarios. These findings provide a theoretical basis for modeling human driving behavior and offer critical insights for enhancing AV-human interaction in real-world traffic environments. Introduction Driving safety is directly influenced by drivers' risk cognition and collision avoidance decisionmaking abilities in high-risk scenarios. In real-world driving, risk cognition generally involves complex interactions among multiple co-existing risk factors rather than being limited to a single risk source (Crosato et al., 2024; Huang et al., 2022).


Two steps to risk sensitivity

Neural Information Processing Systems

Distributional reinforcement learning (RL) – in which agents learn about all the possible long-term consequences of their actions, and not just the expected value – is of great recent interest. One of the most important affordances of a distributional view is facilitating a modern, measured, approach to risk when outcomes are not completely certain. By contrast, psychological and neuroscientific investigations into decision making under risk have utilized a variety of more venerable theoretical models such as prospect theory that lack axiomatically desirable properties such as coherence. Here, we consider a particularly relevant risk measure for modeling human and animal planning, called conditional value-at-risk (CVaR), which quantifies worst-case outcomes (e.g., vehicle accidents or predation). We first adopt a conventional distributional approach to CVaR in a sequential setting and reanalyze the choices of human decision-makers in the well-known two-step task, revealing substantial risk aversion that had been lurking under stickiness and perseveration.


Learning Risk-Aware Quadrupedal Locomotion using Distributional Reinforcement Learning

arXiv.org Artificial Intelligence

Deployment in hazardous environments requires robots to understand the risks associated with their actions and movements to prevent accidents. Despite its importance, these risks are not explicitly modeled by currently deployed locomotion controllers for legged robots. In this work, we propose a risk sensitive locomotion training method employing distributional reinforcement learning to consider safety explicitly. Instead of relying on a value expectation, we estimate the complete value distribution to account for uncertainty in the robot's interaction with the environment. The value distribution is consumed by a risk metric to extract risk sensitive value estimates. These are integrated into Proximal Policy Optimization (PPO) to derive our method, Distributional Proximal Policy Optimization (DPPO). The risk preference, ranging from risk-averse to risk-seeking, can be controlled by a single parameter, which enables to adjust the robot's behavior dynamically. Importantly, our approach removes the need for additional reward function tuning to achieve risk sensitivity. We show emergent risk sensitive locomotion behavior in simulation and on the quadrupedal robot ANYmal.


Catastrophe, Compounding & Consistency in Choice

arXiv.org Artificial Intelligence

Conditional value-at-risk (CVaR) precisely characterizes the influence that rare, catastrophic events can exert over decisions. Such characterizations are important for both normal decision-making and for psychiatric conditions such as anxiety disorders - especially for sequences of decisions that might ultimately lead to disaster. CVaR, like other well-founded risk measures, compounds in complex ways over such sequences - and we recently formalized three structurally different forms in which risk either averages out or multiplies. Unfortunately, existing cognitive tasks fail to discriminate these approaches well; here, we provide examples that highlight their unique characteristics, and make formal links to temporal discounting for the two of the approaches that are time consistent. These examples can ground future experiments with the broader aim of characterizing risk attitudes, especially for longer horizon problems and in psychopathological populations. Introduction Given the many uncertainties in the present and future, we had to evolve sophisticated ways of handling risk. Individual appetites or aversion for risk differ substantially, with various forms of psychopathology arising at the extremes of these preferences. Psychology and neuroscience have focused on single risky decisions (typically just one spin of the wheel of outrageous fortune). Historically, heuristics dominated [1]; however, recently, axiomatically justifiable forms of risk sensitivity from the finance industry are starting to permeate.


Two steps to risk sensitivity

arXiv.org Artificial Intelligence

Distributional reinforcement learning (RL) -- in which agents learn about all the possible long-term consequences of their actions, and not just the expected value -- is of great recent interest. One of the most important affordances of a distributional view is facilitating a modern, measured, approach to risk when outcomes are not completely certain. By contrast, psychological and neuroscientific investigations into decision making under risk have utilized a variety of more venerable theoretical models such as prospect theory that lack axiomatically desirable properties such as coherence. Here, we consider a particularly relevant risk measure for modeling human and animal planning, called conditional value-at-risk (CVaR), which quantifies worst-case outcomes (e.g., vehicle accidents or predation). We first adopt a conventional distributional approach to CVaR in a sequential setting and reanalyze the choices of human decision-makers in the well-known two-step task, revealing substantial risk aversion that had been lurking under stickiness and perseveration. We then consider a further critical property of risk sensitivity, namely time consistency, showing alternatives to this form of CVaR that enjoy this desirable characteristic. We use simulations to examine settings in which the various forms differ in ways that have implications for human and animal planning and behavior.