AITopics

Country: Europe (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Neural Information Processing SystemsFeb-7-2026, 16:15:17 GMT

178b306c7ee66a66db2171646e17da36-Paper-Conference.pdf

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Neural Information Processing SystemsDec-23-2025, 20:07:40 GMT

Reinforcement Learning with Non-Exponential Discounting

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.

name change, non-exponential discounting, reinforcement learning, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Neural Information Processing SystemsMay-26-2025, 18:03:32 GMT

Reinforcement Learning with Non-Exponential Discounting

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton–Jacobi–Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation.

machine learning, non-exponential discounting, reinforcement learning, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceFeb-22-2025

Function-Coherent Gambles

Wheeler, Gregory

The desirable gambles framework provides a foundational approach to imprecise probability theory but relies heavily on linear utility assumptions. This paper introduces {\em function-coherent gambles}, a generalization that accommodates non-linear utility while preserving essential rationality properties. We establish core axioms for function-coherence and prove a representation theorem that characterizes acceptable gambles through continuous linear functionals. The framework is then applied to analyze various forms of discounting in intertemporal choice, including hyperbolic, quasi-hyperbolic, scale-dependent, and state-dependent discounting. We demonstrate how these alternatives to constant-rate exponential discounting can be integrated within the function-coherent framework. This unified treatment provides theoretical foundations for modeling sophisticated patterns of time preference within the desirability paradigm, bridging a gap between normative theory and observed behavior in intertemporal decision-making under genuine uncertainty.

application, discounting, imprecise probability, (16 more...)

2503.01855

Country:

Europe > United Kingdom > England > West Sussex (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (0.46)

Technology: Information Technology > Artificial Intelligence (0.69)

Skalse, Joar, Abate, Alessandro

Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting

arXiv.org Artificial IntelligenceDec-15-2024

The aim of inverse reinforcement learning (IRL) is to infer an agent's preferences from observing their behaviour. Usually, preferences are modelled as a reward function, $R$, and behaviour is modelled as a policy, $\pi$. One of the central difficulties in IRL is that multiple preferences may lead to the same observed behaviour. That is, $R$ is typically underdetermined by $\pi$, which means that $R$ is only partially identifiable. Recent work has characterised the extent of this partial identifiability for different types of agents, including optimal and Boltzmann-rational agents. However, work so far has only considered agents that discount future reward exponentially: this is a serious limitation, especially given that extensive work in the behavioural sciences suggests that humans are better modelled as discounting hyperbolically. In this work, we newly characterise partial identifiability in IRL for agents with non-exponential discounting: our results are in particular relevant for hyperbolical discounting, but they also more generally apply to agents that use other types of (non-exponential) discounting. We significantly show that generally IRL is unable to infer enough information about $R$ to identify the correct optimal policy, which entails that IRL alone can be insufficient to adequately characterise the preferences of such agents.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2412.11155

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(4 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)

Neural Information Processing SystemsOct-9-2024, 23:56:09 GMT

Reinforcement Learning with Non-Exponential Discounting

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton–Jacobi–Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation.

discount function, non-exponential discounting, reinforcement learning

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Gan, Jiarui, Hennes, Annika, Majumdar, Rupak, Mandal, Debmalya, Radanovic, Goran

Markov Decision Processes with Time-Varying Geometric Discounting

arXiv.org Artificial IntelligenceJul-19-2023

Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective -- whereby each time step is treated as an independent decision maker with their own (fixed) discount factor -- and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of $\epsilon$-SPE and show that an $\epsilon$-SPE exists under milder assumptions. An algorithm is presented to compute an $\epsilon$-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.

artificial intelligence, discounting, machine learning, (19 more...)

doi: 10.1609/aaai.v37i10.26413

2307.10491

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Schultheis, Matthias, Rothkopf, Constantin A., Koeppl, Heinz

Reinforcement Learning with Non-Exponential Discounting

arXiv.org Artificial IntelligenceDec-7-2022

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2209.13413

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)