AITopics | exogenous state

Collaborating Authors

exogenous state

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning

Wan, Jia, Sinclair, Sean R., Shah, Devavrat, Wainwright, Martin J.

arXiv.org Machine LearningOct-14-2024

We study a class of structured Markov Decision Processes (MDPs) known as Exo-MDPs. They are characterized by a partition of the state space into two components: the exogenous states evolve stochastically in a manner not affected by the agent's actions, whereas the endogenous states can be affected by actions, and evolve according to deterministic dynamics involving both the endogenous and exogenous states. Exo-MDPs provide a natural model for various applications, including inventory control, portfolio management, power systems, and ride-sharing, among others. While seemingly restrictive on the surface, our first result establishes that any discrete MDP can be represented as an Exo-MDP. The underlying argument reveals how transition and reward dynamics can be written as linear functions of the exogenous state distribution, showing how Exo-MDPs are instances of linear mixture MDPs, thereby showing a representational equivalence between discrete MDPs, Exo-MDPs, and linear mixture MDPs. The connection between Exo-MDPs and linear mixture MDPs leads to algorithms that are near sample-optimal, with regret guarantees scaling with the (effective) size of the exogenous state space $d$, independent of the sizes of the endogenous state and action spaces, even when the exogenous state is {\em unobserved}. When the exogenous state is unobserved, we establish a regret upper bound of $O(H^{3/2}d\sqrt{K})$ with $K$ trajectories of horizon $H$ and unobserved exogenous state of dimension $d$. We also establish a matching regret lower bound of $\Omega(H^{3/2}d\sqrt{K})$ for non-stationary Exo-MDPs and a lower bound of $\Omega(Hd\sqrt{K})$ for stationary Exo-MDPs. We complement our theoretical findings with an experimental study on inventory control problems.

algorithm, exo-mdp, linear mixture mdp, (15 more...)

arXiv.org Machine Learning

2409.14557

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry:

Transportation > Passenger (0.34)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Robust Data-Driven Dynamic Programming

Neural Information Processing SystemsMar-13-2024, 23:17:49 GMT

In stochastic optimal control the distribution of the exogenous noise is typically unknown and must be inferred from limited data before dynamic programming (DP)-based solution schemes can be applied. If the conditional expectations in the DP recursions are estimated via kernel regression, however, the historical sample paths enter the solution procedure directly as they determine the evaluation points of the cost-to-go functions. The resulting data-driven DP scheme is asymptotically consistent and admits an efficient computational solution when combined with parametric value function approximations. If training data is sparse, however, the estimated cost-to-go functions display a high variability and an optimistic bias, while the corresponding control policies perform poorly in out-of-sample tests. To mitigate these small sample effects, we propose a robust data-driven DP scheme, which replaces the expectations in the DP recursions with worst-case expectations over a set of distributions close to the best estimate. We show that the arising minmax problems in the DP recursions reduce to tractable conic programs. We also demonstrate that the proposed robust DP algorithm dominates various non-robust schemes in out-of-sample tests across several application domains.

dynamic programming, exogenous state, probability, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > North Carolina (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Ohio (0.04)
Europe > United Kingdom (0.04)

Industry:

Energy (0.69)
Banking & Finance > Trading (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robust Data-Driven Dynamic Programming

Hanasusanto, Grani Adiwena, Kuhn, Daniel

Neural Information Processing SystemsDec-31-2013

In stochastic optimal control the distribution of the exogenous noise is typically unknown and must be inferred from limited data before dynamic programming (DP)-based solution schemes can be applied. If the conditional expectations in the DP recursions are estimated via kernel regression, however, the historical sample paths enter the solution procedure directly as they determine the evaluation points of the cost-to-go functions. The resulting data-driven DP scheme is asymptotically consistent and admits efficient computational solution when combined with parametric value function approximations. If training data is sparse, however, the estimated cost-to-go functions display a high variability and an optimistic bias, while the corresponding control policies perform poorly in out-of-sample tests. To mitigate these small sample effects, we propose a robust data-driven DP scheme, which replaces the expectations in the DP recursions with worst-case expectations over a set of distributions close to the best estimate. We show that the arising min-max problems in the DP recursions reduce to tractable conic programs. We also demonstrate that this robust algorithm dominates state-of-the-art benchmark algorithms in out-of-sample tests across several application domains.

artificial intelligence, dynamic programming, machine learning, (16 more...)

Neural Information Processing Systems

Industry:

Energy (0.70)
Banking & Finance > Trading (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback