AITopics | backward induction

Collaborating Authors

backward induction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Restless to Contextual: A Thresholding Bandit Approach to Improve Finite-horizon Performance

Xu, Jiamin, Nazarov, Ivan, Rastogi, Aditya, Periáñez, África, Gan, Kyra

arXiv.org Artificial IntelligenceFeb-7-2025

Online restless bandits extend classic contextual bandits by incorporating state transitions and budget constraints, representing each agent as a Markov Decision Process (MDP). This framework is crucial for finite-horizon strategic resource allocation, optimizing limited costly interventions for long-term benefits. However, learning the underlying MDP for each agent poses a major challenge in finite-horizon settings. To facilitate learning, we reformulate the problem as a scalable budgeted thresholding contextual bandit problem, carefully integrating the state transitions into the reward design and focusing on identifying agents with action benefits exceeding a threshold. We establish the optimality of an oracle greedy solution in a simple two-state setting, and propose an algorithm that achieves minimax optimal constant regret in the online multi-state setting with heterogeneous agents and knowledge of outcomes under no intervention. We numerically show that our algorithm outperforms existing online restless bandit methods, offering significant improvements in finite-horizon performance.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.05145

Country:

Africa (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Health Care Providers & Services (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test

Ebihara, Akinori F., Miyagawa, Taiki, Sakurai, Kazuyuki, Imaoka, Hitoshi

arXiv.org Artificial IntelligenceJan-29-2025

Time-sensitive machine learning benefits from Sequential Probability Ratio Test (SPRT), which provides an optimal stopping time for early classification of time series. However, in finite horizon scenarios, where input lengths are finite, determining the optimal stopping rule becomes computationally intensive due to the need for backward induction, limiting practical applicability. We thus introduce FIRMBOUND, an SPRT-based framework that efficiently estimates the solution to backward induction from training data, bridging the gap between optimal stopping theory and real-world deployment. It employs density ratio estimation and convex function learning to provide statistically consistent estimators for sufficient statistic and conditional expectation, both essential for solving backward induction; consequently, FIRMBOUND minimizes Bayes risk to reach optimality. Additionally, we present a faster alternative using Gaussian process regression, which significantly reduces training time while retaining low deployment overhead, albeit with potential compromise in statistical consistency. Experiments across independent and identically distributed (i.i.d.), non-i.i.d., binary, multiclass, synthetic, and real-world datasets show that FIRMBOUND achieves optimalities in the sense of Bayes risk and speed-accuracy tradeoff. Furthermore, it advances the tradeoff boundary toward optimality when possible and reduces decision-time variance, ensuring reliable decision-making. Code is publicly available at https://github.com/Akinori-F-Ebihara/FIRMBOUND

artificial intelligence, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2501.18059

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback

Reviews: Multi-View Decision Processes: The Helper-AI Problem

Neural Information Processing SystemsOct-7-2024, 15:22:28 GMT

The paper introduces multi-view decision processes (MVDP) as a cooperative game between two agents, where one agent may have an imperfect model of the transition probabilities of the other agent. Generally, the work is motivated well, the problem is interesting and clearly relevant to AI and the NIPS community. The theoretical results are nicely complemented by experiments. Presentation is mostly smooth: the main goals are consistent with the narrative throughout the paper, but the some details from the experiments seem less clear (seem points below.) Some points are confusing and need further clarification: 1.

backward induction, helper-ai problem, multi-view decision process, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence (0.72)
Information Technology > Decision Support Systems (0.62)
Information Technology > Game Theory (0.57)

Add feedback

On the Structure of Game Provenance and its Applications

Bowers, Shawn, Xia, Yilin, Ludäscher, Bertram

arXiv.org Artificial IntelligenceOct-7-2024

Provenance in databases has been thoroughly studied for positive and for recursive queries, then for first-order (FO) queries, i.e., having negation but no recursion. Query evaluation can be understood as a two-player game where the opponents argue whether or not a tuple is in the query answer. This game-theoretic approach yields a natural provenance model for FO queries, unifying how and why-not provenance. Here, we study the fine-grain structure of game provenance. A game $G=(V,E)$ consists of positions $V$ and moves $E$ and can be solved by computing the well-founded model of a single, unstratifiable rule: \[ \text{win}(X) \leftarrow \text{move}(X, Y), \neg \, \text{win}(Y). \] In the solved game $G^{\lambda}$, the value of a position $x\,{\in}\,V$ is either won, lost, or drawn. This value is explained by the provenance $\mathscr{P}$(x), i.e., certain (annotated) edges reachable from $x$. We identify seven edge types that give rise to new kinds of provenance, i.e., potential, actual, and primary, and demonstrate that "not all moves are created equal". We describe the new provenance types, show how they can be computed while solving games, and discuss applications, e.g., for abstract argumentation frameworks.

actual provenance, opponent, provenance, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/EuroSPW61312.2024.00073

2410.05094

Country:

North America > United States > Illinois (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.58)

Add feedback

Metareasoning in uncertain environments: a meta-BAMDP framework

Godara, Prakhar, Aléman, Tilman Diego, Yu, Angela J.

arXiv.org Artificial IntelligenceAug-2-2024

In decision-making scenarios, \textit{reasoning} can be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome such as maximizing the value function of a Markov decision process (MDP). However, executing $P$ itself may bear some costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Such costs need to be taken into account in order to accurately model human behavior, as well as optimizing AI planning, as all physical systems are bound to face resource constraints. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to two-armed Bernoulli bandit (TABB) tasks, which have often been used to study human decision making. Owing to the meta problem's complexity, our solutions are necessarily approximate, but nevertheless robust within a range of assumptions that are arguably realistic for human decision-making scenarios. These results offer a normative framework for understanding human exploration under cognitive constraints. This integration of Bayesian adaptive strategies with metareasoning enriches both the theoretical landscape of decision-making research and practical applications in designing AI systems that plan under uncertainty and resource constraints.

agent, computational action, value function, (15 more...)

arXiv.org Artificial Intelligence

2408.01253

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Metareasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Active Inference and Intentional Behaviour

Friston, Karl J., Salvatori, Tommaso, Isomura, Takuya, Tschantz, Alexander, Kiefer, Alex, Verbelen, Tim, Koudahl, Magnus, Paul, Aswin, Parr, Thomas, Razi, Adeel, Kagan, Brett, Buckley, Christopher L., Ramstead, Maxwell J. D.

arXiv.org Artificial IntelligenceDec-16-2023

Recent advances in theoretical biology suggest that basal cognition and sentient behaviour are emergent properties of in vitro cell cultures and neuronal networks, respectively. Such neuronal networks spontaneously learn structured behaviours in the absence of reward or reinforcement. In this paper, we characterise this kind of self-organisation through the lens of the free energy principle, i.e., as self-evidencing. We do this by first discussing the definitions of reactive and sentient behaviour in the setting of active inference, which describes the behaviour of agents that model the consequences of their actions. We then introduce a formal account of intentional behaviour, that describes agents as driven by a preferred endpoint or goal in latent state-spaces. We then investigate these forms of (reactive, sentient, and intentional) behaviour using simulations. First, we simulate the aforementioned in vitro experiments, in which neuronal cultures spontaneously learn to play Pong, by implementing nested, free energy minimising processes. The simulations are then used to deconstruct the ensuing predictive behaviour, leading to the distinction between merely reactive, sentient, and intentional behaviour, with the latter formalised in terms of inductive planning. This distinction is further studied using simple machine learning benchmarks (navigation in a grid world and the Tower of Hanoi problem), that show how quickly and efficiently adaptive behaviour emerges under an inductive form of active inference.

agent, free energy, inference, (15 more...)

arXiv.org Artificial Intelligence

2312.07547

Country:

Asia > Vietnam > Hanoi > Hanoi (0.25)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
(7 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

A game that stymies AI

#artificialintelligenceOct-25-2022, 15:12:00 GMT

Artificial intelligence (AI) has succeeded spectacularly in certain kinds of tasks. These include playing specific games, such as Chess or Go, and finding patterns in images, such as identifying when a human organ is diseased or otherwise abnormal. It has done much less well in situations requiring more generalised learning, such as in understanding a text, or translating it from one language into another. Worse, acting like a human by expressing--and, especially, feeling--appropriate emotions seems well beyond AI's capacity. AI is good at some specialised tasks, but not as good at more general ones.

backward induction, catch-up, opponent, (15 more...)

#artificialintelligence

Country: North America > United States > New York (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Reinforced optimal control

Bayer, Christian, Belomestny, Denis, Hager, Paul, Pigato, Paolo, Schoenmakers, John, Spokoiny, Vladimir

arXiv.org Machine LearningNov-24-2020

Least squares Monte Carlo methods are a popular numerical approximation method for solving stochastic control problems. Based on dynamic programming, their key feature is the approximation of the conditional expectation of future rewards by linear least squares regression. Hence, the choice of basis functions is crucial for the accuracy of the method. Earlier work by some of us [Belomestny, Schoenmakers, Spokoiny, Zharkynbay. Commun.~Math.~Sci., 18(1):109-121, 2020] proposes to \emph{reinforce} the basis functions in the case of optimal stopping problems by already computed value functions for later times, thereby considerably improving the accuracy with limited additional computational cost. We extend the reinforced regression method to a general class of stochastic control problems, while considerably improving the method's efficiency, as demonstrated by substantial numerical examples as well as theoretical analysis.

basis function, survey article, upstream oil & gas, (21 more...)

arXiv.org Machine Learning

2011.12382

Country: North America (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

Add feedback

Comment: Entropy Learning for Dynamic Treatment Regimes

Kallus, Nathan

arXiv.org Machine LearningApr-6-2020

I congratulate Profs. Binyan Jiang, Rui Song, Jialiang Li, and Donglin Zeng (JSLZ) for an exciting development in conducting inferences on optimal dynamic treatment regimes (DTRs) learned via empirical risk minimization using the entropy loss as a surrogate. JSLZ's approach leverages a rejection-and-importance-sampling estimate of the value of a given decision rule based on inverse probability weighting (IPW) and its interpretation as a weighted (or cost-sensitive) classification. Their use of smooth classification surrogates enables their careful approach to analyzing asymptotic distributions. However, even for evaluation purposes, the IPW estimate is problematic as it leads to weights that discard most of the data and are extremely variable on whatever remains. In this comment, I discuss an optimization-based alternative to evaluating DTRs, review several connections, and suggest directions forward. This extends the balanced policy evaluation approach of Kallus (2018a) to the longitudinal setting.

estimator, evaluation, kallus, (14 more...)

arXiv.org Machine Learning

doi: 10.5705/ss.202019.0115

2004.02778

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.34)

Add feedback

Inferential Induction: Joint Bayesian Estimation of MDPs and Value Functions

Dimitrakakis, Christos, Eriksson, Hannes, Jorge, Emilio, Grover, Divya, Basu, Debabrota

arXiv.org Machine LearningFeb-8-2020

Bayesian reinforcement learning (BRL) offers a decision-theoretic solution to the problem of reinforcement learning. However, typical model-based BRL algorithms have focused either on ma intaining a posterior distribution on models or value functions and combining this with approx imate dynamic programming or tree search. This paper describes a novel backwards induction pri nciple for performing joint Bayesian estimation of models and value functions, from which many new BRL algorithms can be obtained. We demonstrate this idea with algorithms and experiments in discrete state spaces.

algorithm, value function, value function distribution, (14 more...)

arXiv.org Machine Learning

2002.03098

Country:

North America > United States > New Jersey (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback