AITopics | finite time analysis

Collaborating Authors

finite time analysis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Review for NeurIPS paper: Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

Neural Information Processing SystemsJan-27-2025, 02:07:12 GMT

This paper is theoretical work that provides finite time analysis for decentralised TD learning. The reviewers and myself, although not anonymously, think this contribution may be significant and interesting to the community due to recent interest in the finite time analysis of TD algorithms and (linear) function approximation. We request the authors to address the changes required in the manuscript. The authors propose a distributed method for safety. The reviewers and myself were not convinced that this paper proposes a novel method, specifically, due to lack of proper comparison to previous work.

decentralized td tracking, finite-time analysis, linear function approximation, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.70)

Add feedback

Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP

Sangadi, Tejaram, Prashanth, L. A., Jagannathan, Krishna

arXiv.org Artificial IntelligenceJun-12-2024

In the standard reinforcement learning (RL) setting, the objective is to learn a policy that maximizes the value function, which is the expectation of the cumulative reward that is obtained over a finite or infinite time horizon. However, in several practical scenarios including finance, automated driving and drug testing, a risk sensitive learning paradigm assumes importance, wherein the value function, which is an expectation, needs to be traded off suitably with an appropriate risk metric associated with the reward distribution. One way to achieve this is to solve a constrained optimization problem with this risk metric as a constraint, and the value function as the objective. Variance is a popular risk measure, which is usually incorporated into a risk-sensitive optimization problem as a constraint, with the usual expected value as the objective. Such a mean-variance formulation was studied in the seminal work of Markowitz [10]. In the context of RL, mean-variance optimization has been considered in several previous works, cf.

finite time analysis, mean-variance, temporal difference learning, (12 more...)

arXiv.org Artificial Intelligence

2406.07892

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms

Panda, Prashansa, Bhatnagar, Shalabh

arXiv.org Artificial IntelligenceOct-25-2023

Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$) of the performance (Lagrange) function $L(\theta,\gamma)$, with a sample complexity of $\mathcal{\tilde{O}}(\epsilon^{-2.5})$ in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms.We also show the results of experiments on a few different grid world settings and observe good empirical performance using both of these algorithms. In particular, for large grid sizes, Constrained Natural Actor Critic shows slightly better results than Constrained Actor Critic while the latter is slightly better for a small grid size.

constrained actor critic, constrained natural actor critic algorithm, finite time analysis

arXiv.org Artificial Intelligence

2310.16363

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Bhandari, Jalaj, Russo, Daniel, Singal, Raghav

arXiv.org Machine LearningJun-6-2018

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a simple and explicit finite time analysis of temporal difference learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. A final section of the paper shows that all of our main results extend to the study of Q-learning applied to high-dimensional optimal stopping problems.

finite time analysis, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1806.0245

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Add feedback