AITopics | risk-sensitive policy

Distributional Reinforcement Learning for Risk-Sensitive Policies

Neural Information Processing SystemsDec-25-2025, 06:13:08 GMT

We address the problem of learning a risk-sensitive policy based on the CVaR risk measure using distributional reinforcement learning. In particular, we show that the standard action-selection strategy when applying the distributional Bellman optimality operator can result in convergence to neither the dynamic, Markovian CVaR nor the static, non-Markovian CVaR. We propose modifications to the existing algorithms that include a new distributional Bellman operator and show that the proposed strategy greatly expands the utility of distributional RL in learning and representing CVaR-optimized policies. Our proposed approach is a simple extension of standard distributional RL algorithms and can therefore take advantage of many of the recent advances in deep RL. On both synthetic and real data, we empirically show that our proposed algorithm is able to learn better CVaR-optimized policies.

distributional reinforcement learning, name change, risk-sensitive policy, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

c2626d850c80ea07e7511bbae4c76f4b-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 05:33:58 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Oregon (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

Distributional Reinforcement Learning for Risk-Sensitive Policies

Neural Information Processing SystemsJan-18-2025, 21:02:46 GMT

We address the problem of learning a risk-sensitive policy based on the CVaR risk measure using distributional reinforcement learning. In particular, we show that the standard action-selection strategy when applying the distributional Bellman optimality operator can result in convergence to neither the dynamic, Markovian CVaR nor the static, non-Markovian CVaR. We propose modifications to the existing algorithms that include a new distributional Bellman operator and show that the proposed strategy greatly expands the utility of distributional RL in learning and representing CVaR-optimized policies. Our proposed approach is a simple extension of standard distributional RL algorithms and can therefore take advantage of many of the recent advances in deep RL. On both synthetic and real data, we empirically show that our proposed algorithm is able to learn better CVaR-optimized policies.

cvar-optimized policy, distributional reinforcement learning, risk-sensitive policy, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Risk-averse policies for natural gas futures trading using distributional reinforcement learning

Hêche, Félicien, Nigro, Biagio, Barakat, Oussama, Robert-Nicoud, Stephan

arXiv.org Artificial IntelligenceJan-8-2025

Financial markets have experienced significant instabilities in recent years, creating unique challenges for trading and increasing interest in risk-averse strategies. Distributional Reinforcement Learning (RL) algorithms, which model the full distribution of returns rather than just expected values, offer a promising approach to managing market uncertainty. This paper investigates this potential by studying the effectiveness of three distributional RL algorithms for natural gas futures trading and exploring their capacity to develop risk-averse policies. Specifically, we analyze the performance and behavior of Categorical Deep Q-Network (C51), Quantile Regression Deep Q-Network (QR-DQN), and Implicit Quantile Network (IQN). To the best of our knowledge, these algorithms have never been applied in a trading context. These policies are compared against five Machine Learning (ML) baselines, using a detailed dataset provided by Predictive Layer SA, a company supplying ML-based strategies for energy trading. The main contributions of this study are as follows. (1) We demonstrate that distributional RL algorithms significantly outperform classical RL methods, with C51 achieving performance improvement of more than 32\%. (2) We show that training C51 and IQN to maximize CVaR produces risk-sensitive policies with adjustable risk aversion. Specifically, our ablation studies reveal that lower CVaR confidence levels increase risk aversion, while higher levels decrease it, offering flexible risk management options. In contrast, QR-DQN shows less predictable behavior. These findings emphasize the potential of distributional RL for developing adaptable, risk-averse trading strategies in volatile markets.

experiment, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2501.04421

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.69)
Research Report > Promising Solution (0.48)

Industry: Energy > Oil & Gas > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Risk-Sensitive Policy with Distributional Reinforcement Learning

Théate, Thibaut, Ernst, Damien

arXiv.org Artificial IntelligenceDec-30-2022

Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the $Q$ function generally standing at the core of learning schemes in RL by another function taking into account both the expected return and the risk. Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm. This enables to span the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, and with an emphasis on the interpretability of the resulting decision-making process.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2212.14743

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation

Choi, Jinyoung, Dance, Christopher R., Kim, Jung-eun, Hwang, Seulbin, Park, Kyung-sik

arXiv.org Artificial IntelligenceApr-9-2021

Modern navigation algorithms based on deep reinforcement learning (RL) show promising efficiency and robustness. However, most deep RL algorithms operate in a risk-neutral manner, making no special attempt to shield users from relatively rare but serious outcomes, even if such shielding might cause little loss of performance. Furthermore, such algorithms typically make no provisions to ensure safety in the presence of inaccuracies in the models on which they were trained, beyond adding a cost-of-collision and some domain randomization while training, in spite of the formidable complexity of the environments in which they operate. In this paper, we present a novel distributional RL algorithm that not only learns an uncertainty-aware policy, but can also change its risk measure without expensive fine-tuning or retraining. Our method shows superior performance and safety over baselines in partially-observed navigation tasks. We also demonstrate that agents trained using our method can adapt their policies to a wide range of risk measures at run-time.

algorithm, rc-dsac, risk measure, (12 more...)

arXiv.org Artificial Intelligence

2104.03111

Country:

Europe > France (0.04)
Asia > South Korea (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping

Bodnar, Cristian, Li, Adrian, Hausman, Karol, Pastor, Peter, Kalakrishnan, Mrinal

arXiv.org Machine LearningOct-1-2019

Quantile QT -Opt for Risk-A ware Vision-Based Robotic Grasping Cristian Bodnar 1, Adrian Li 2, Karol Hausman 3, Peter Pastor 2, Mrinal Kalakrishnan 2 Abstract -- The distributional perspective on reinforcement learning (RL) has given rise to a series of successful Q-learning algorithms, resulting in state-of-the-art performance in arcade game environments. However, it has not yet been analyzed how these findings from a discrete setting translate to complex practical applications characterized by noisy, high dimensional and continuous state-action spaces. In this work, we propose Quantile QT -Opt (Q2-Opt), a distributional variant of the recently introduced distributed Q-learning algorithm [11] for continuous domains, and examine its behaviour in a series of simulated and real vision-based robotic grasping tasks. The absence of an actor in Q2-Opt allows us to directly draw a parallel to the previous discrete experiments in the literature without the additional complexities induced by an actor-critic architecture. We demonstrate that Q2-Opt achieves a superior vision-based object grasping success rate, while also being more sample efficient. The distributional formulation also allows us to experiment with various risk-distortion metrics that give us an indication of how robots can concretely manage risk in practice using a Deep RL control policy. As an additional contribution, we perform experiments on offline datasets and compare them with the latest findings from discrete settings. Surprisingly, we find that there is a discrepancy between our results and the previous batch RL findings from the literature obtained on arcade game environments. I. INTRODUCTION The new distributional perspective on RL has produced a novel class of Deep Q-learning methods that learn a distribution over the state-action returns, instead of using the expectation given by the traditional value function.

experiment, q2f-opt, qt -opt, (15 more...)

arXiv.org Machine Learning

1910.02787

Country:

North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games > Computer Games (0.95)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Implicit Quantile Networks for Distributional Reinforcement Learning

Dabney, Will, Ostrovski, Georg, Silver, David, Munos, Rémi

arXiv.org Artificial IntelligenceJun-14-2018

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm's implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1806.06923

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Risk-Variant Policy Switching to Exceed Reward Thresholds

Kane, Breelyn Melissa (Carnegie Mellon University) | Simmons, Reid (Carnegie Mellon University)

AAAI ConferencesJun-8-2012

This paper presents a decision-theoretic planning approach for probabilistic environments where the agent's goal is to win, which we model as maximizing the probability of being above a given reward threshold. In competitive domains, second is as good as last, and it is often desirable to take risks if one is in danger of losing, even if the risk does not pay off very often. Our algorithm maximizes the probability of being above a particular reward threshold by dynamically switching between a suite of policies, each of which encodes a different level of risk. This method does not explicitly encode time or reward into the state space, and decides when to switch between policies during each execution step. We compare a risk-neutral policy to switching among different risk-sensitive policies, and show that our approach improves the agent's probability of winning.

probability, threshold, utility function, (16 more...)

AAAI Conferences

Twenty-Second International Conference on Automated Planning and Scheduling

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.66)

Add feedback

Risk-Sensitive Policies for Sustainable Renewable Resource Allocation

Ermon, Stefano (Cornell University) | Conrad, Jon (Cornell University) | Gomes, Carla (Cornell University) | Selman, Bart (Cornell University)

AAAI ConferencesJul-19-2011

Markov Decision Processes arise as a natural model for many renewable resources allocation problems. In many such problems, high stakes decisions with potentially catastrophic outcomes (such as the collapse of an entire ecosystem) need to be taken by carefully balancing social, economic, and ecologic goals. We introduce a broad class of such MDP models with a risk averse attitude of the decision maker, in order to obtain policies that are more balanced with respect to the welfare of future generations. We prove that they admit a closed form solution that can be efficiently computed. We show an application of the proposed framework to the Pacific Halibut marine fishery, obtaining new and more cautious policies. Our results strengthen findings of related policies from the literature by providing new evidence that a policy based on periodic closures of the fishery should be employed, in place of the one traditionally used that harvests a constant proportion of the stock every year.

artificial intelligence, decision support system, machine learning, (17 more...)

AAAI Conferences

Twenty-Second International Joint Conference on Artificial Intelligence

Country: