discounting
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- North America > United States > Maryland (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Teaching Precommitted Agents: Model-Free Policy Evaluation and Control in Quasi-Hyperbolic Discounted MDPs
Abstract-- Time-inconsistent preferences, where agents favor smaller-sooner over larger-later rewards, are a key feature of human and animal decision-making. Quasi-Hyperbolic (QH) discounting provides a simple yet powerful model for this behavior, but its integration into the reinforcement learning (RL) framework has been limited. We make two primary contributions: (i) we formally characterize the structure of the optimal policy, proving for the first time that it reduces to a simple one-step non-stationary form; and (ii) we design the first practical, model-free algorithms for both policy evaluation and Q-learning in this setting, both with provable convergence guarantees. Our results provide foundational insights for incorporating QH preferences in RL. Reinforcement learning (RL) [12] provides a powerful framework for sequential decision-making, where an agent interacts with an environment to maximize long-term cumulative rewards.
- Asia > India > Karnataka > Bengaluru (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
The Paradox of Doom: Acknowledging Extinction Risk Reduces the Incentive to Prevent It
Growiec, Jakub, Prettner, Klaus
We investigate the salience of extinction risk as a source of impatience. Our framework distinguishes between human extinction risk and individual mortality risk while allowing for various degrees of intergenerational altruism. Additionally, we consider the evolutionarily motivated "selfish gene" perspective. We find that the risk of human extinction is an indispensable component of the discount rate, whereas individual mortality risk can be hedged against - partially or fully, depending on the setup - through human reproduction. Overall, we show that in the face of extinction risk, people become more impatient rather than more farsighted. Thus, the greater the threat of extinction, the less incentive there is to invest in avoiding it. Our framework can help explain why humanity consistently underinvests in mitigation of catastrophic risks, ranging from climate change mitigation, via pandemic prevention, to addressing the emerging risks of transformative artificial intelligence.
- Europe > Austria > Vienna (0.14)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Adaptive Traffic-Following Scheme for Orderly Distributed Control of Multi-Vehicle Systems
Jain, Anahita, Idris, Husni, Clarke, John-Paul, Delahaye, Daniel
--We present an adaptive control scheme to enable the emergence of order within distributed, autonomous multi-agent systems. Past studies showed that under high-density conditions, order generated from traffic-following behavior reduces travel times, while under low densities, choosing direct paths is more beneficial. In this paper, we leveraged those findings to allow aircraft to independently and dynamically adjust their degree of traffic-following behavior based on the current state of the airspace. This enables aircraft to follow other traffic only when beneficial. Quantitative analyses revealed that dynamic traffic-following behavior results in lower aircraft travel times at the cost of minimal levels of additional disorder to the airspace. The sensitivity of these benefits to temporal and spatial horizons was also investigated. Overall, this work highlights the benefits, and potential necessity, of incorporating self-organizing behavior in making distributed, autonomous multi-agent systems scalable. Autonomous vehicle operations are expected to increase in the airspace over the coming decades. Initially, applications will include non-passenger operations, such as fire fighting or cargo delivery using uncrewed aerial vehicles of different sizes. Eventually, the scope will expand to passenger-carrying vehicles for urban or regional air mobility. These vehicles are expected to interact and integrate with other traffic within the same airspace. For scalability, this airspace of the future will be a collective system of autonomous vehicles, where each vehicle makes increasingly independent decisions.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Transportation > Air (1.00)
- Transportation > Infrastructure & Services (0.94)
- Government > Regional Government > North America Government > United States Government (0.46)
Information-Seeking Decision Strategies Mitigate Risk in Dynamic, Uncertain Environments
Barendregt, Nicholas W., Gold, Joshua I., Josić, Krešimir, Kilpatrick, Zachary P.
To survive in dynamic and uncertain environments, individuals must develop effective decision strategies that balance information gathering and decision commitment. Models of such strategies often prioritize either optimizing tangible payoffs, like reward rate, or gathering information to support a diversity of (possibly unknown) objectives. However, our understanding of the relative merits of these two approaches remains incomplete, in part because direct comparisons have been limited to idealized, static environments that lack the dynamic complexity of the real world. Here we compared the performance of normative reward- and information-seeking strategies in a dynamic foraging task. Both strategies show similar transitions between exploratory and exploitative behaviors as environmental uncertainty changes. However, we find subtle disparities in the actions they take, resulting in meaningful performance differences: whereas reward-seeking strategies generate slightly more reward on average, information-seeking strategies provide more consistent and predictable outcomes. Our findings support the adaptive value of information-seeking behaviors that can mitigate risk with minimal reward loss.
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- North America > Cuba (0.04)
- Information Technology > Information Management (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Function-Coherent Gambles
The desirable gambles framework provides a foundational approach to imprecise probability theory but relies heavily on linear utility assumptions. This paper introduces {\em function-coherent gambles}, a generalization that accommodates non-linear utility while preserving essential rationality properties. We establish core axioms for function-coherence and prove a representation theorem that characterizes acceptable gambles through continuous linear functionals. The framework is then applied to analyze various forms of discounting in intertemporal choice, including hyperbolic, quasi-hyperbolic, scale-dependent, and state-dependent discounting. We demonstrate how these alternatives to constant-rate exponential discounting can be integrated within the function-coherent framework. This unified treatment provides theoretical foundations for modeling sophisticated patterns of time preference within the desirability paradigm, bridging a gap between normative theory and observed behavior in intertemporal decision-making under genuine uncertainty.
Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting
Skalse, Joar, Abate, Alessandro
The aim of inverse reinforcement learning (IRL) is to infer an agent's preferences from observing their behaviour. Usually, preferences are modelled as a reward function, $R$, and behaviour is modelled as a policy, $\pi$. One of the central difficulties in IRL is that multiple preferences may lead to the same observed behaviour. That is, $R$ is typically underdetermined by $\pi$, which means that $R$ is only partially identifiable. Recent work has characterised the extent of this partial identifiability for different types of agents, including optimal and Boltzmann-rational agents. However, work so far has only considered agents that discount future reward exponentially: this is a serious limitation, especially given that extensive work in the behavioural sciences suggests that humans are better modelled as discounting hyperbolically. In this work, we newly characterise partial identifiability in IRL for agents with non-exponential discounting: our results are in particular relevant for hyperbolical discounting, but they also more generally apply to agents that use other types of (non-exponential) discounting. We significantly show that generally IRL is unable to infer enough information about $R$ to identify the correct optimal policy, which entails that IRL alone can be insufficient to adequately characterise the preferences of such agents.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (4 more...)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- North America > United States > Maryland (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)