AITopics | Giegrich, Michael

Collaborating Authors

Giegrich, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

$K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control

Giegrich, Michael, Oomen, Roel, Reisinger, Christoph

arXiv.org Machine LearningJan-10-2024

In reinforcement learning (RL), off-policy evaluation (OPE) deals with the problem of estimating the value of a target policy with observations generated from a different behavior policy. OPE methods are typically applied to sequential decision making problems where observational data is available but experimentation with the environment directly is not possible or costly. More broadly, OPE methods are a widely researched subject in RL (see, e.g., [60, 23, 58] for recent overviews), however, relatively little attention has been paid to stochastic environments where the stochasticity depends on the chosen actions and state and action spaces are continuous. For example, common benchmark problems are either deterministic or have finite state and/or action spaces (see, e.g., [60, 23]). Notwithstanding this, stochastic control problems are precisely concerned with the setting where a decision process affects random transitions. Stochastic control is a field closely related to reinforcement learning and its methods have been applied to a wide range of high-stakes decision-making problems in diverse fields such as operations research [24, 41], economics [31, 29], electrical engineering [44, 17], autonomous driving [62] and finance [15, 55]. In the stochastic control literature, optimal policies are often represented as deterministic feedback policies (i.e., as deterministic functions of the current state) and, in the episodic case, are non-stationary due to the impact of a finite time-horizon. Stochastic control environments pose a challenging setting for OPE methods. For example, classical methods like importance sampling (IS) [50] struggle with deterministic target policies in continuous action spaces due to the severe policy mismatch between the target and the behavior policy (see, e.g.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2306.04836

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.67)
Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Convergence of policy gradient methods for finite-horizon stochastic linear-quadratic control problems

Giegrich, Michael, Reisinger, Christoph, Zhang, Yufei

arXiv.org Artificial IntelligenceOct-19-2023

We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.

artificial intelligence, machine learning, proposition 2, (18 more...)

arXiv.org Artificial Intelligence

2211.00617

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback