AITopics | tsitsiklis

Collaborating Authors

tsitsiklis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scalable Netw

Neural Information Processing SystemsFeb-7-2026, 14:33:43 GMT

machine learning, reinforcement learning, tsitsiklis, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Pasadena (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

Reinforcement Learning in POMDP's via Direct Gradient Ascent

Baxter, Jonathan, Bartlett, Peter L.

arXiv.org Artificial IntelligenceDec-8-2025

This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled POMDPs. We introduce GPOMDP, a REINFORCE-like algorithm for estimating an approximation to the gradient of the average reward as a function of the parameters of a stochastic policy. The algorithm's chief advantages are that it requires only a single sample path of the underlying Markov chain, it uses only one free parameter $β\in [0,1)$, which has a natural interpretation in terms of bias-variance trade-off, and it requires no knowledge of the underlying state. We prove convergence of GPOMDP and show how the gradient estimates produced by GPOMDP can be used in a conjugate-gradient procedure to find local optima of the average reward.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.02383

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Finite-Sample Analysis of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Chen, Suei-Wen, Ross, Keith, Youssef, Pierre

arXiv.org Artificial IntelligenceOct-3-2024

Monte Carlo Exploring Starts (MCES), which aims to learn the optimal policy using only sample returns, is a simple and natural algorithm in reinforcement learning which has been shown to converge under various conditions. However, the convergence rate analysis for MCES-style algorithms in the form of sample complexity has received very little attention. In this paper we develop a finite sample bound for a modified MCES algorithm which solves the stochastic shortest path problem. To this end, we prove a novel result on the convergence rate of the policy iteration algorithm. This result implies that with probability at least $1-\delta$, the algorithm returns an optimal policy after $\tilde{O}(SAK^3\log^3\frac{1}{\delta})$ sampled episodes, where $S$ and $A$ denote the number of states and actions respectively, $K$ is a proxy for episode length, and $\tilde{O}$ hides logarithmic factors and constants depending on the rewards of the environment that are assumed to be known.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2410.02994

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

Winnicki, Anna, Srikant, R.

arXiv.org Artificial IntelligenceFeb-28-2023

A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value function. A well-known longstanding open problem in this context is to prove the convergence of such a scheme when the value function of a policy is estimated from data collected from a single sample path obtained from implementing the policy (see page 99 of [Sutton and Barto, 2018], page 8 of [Tsitsiklis, 2002]). We present a solution to the open problem by showing that a first-visit version of such a policy iteration scheme indeed converges to the optimal policy provided that the policy improvement step uses lookahead [Silver et al., 2016, Mnih et al., 2016, Silver et al., 2017b] rather than a simple greedy policy improvement. We provide results both for the original open problem in the tabular setting and also present extensions to the function approximation setting, where we show that the policy resulting from the algorithm performs close to the optimal policy within a function approximation error.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2301.09709

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On the Convergence of Discounted Policy Gradient Methods

Nota, Chris

arXiv.org Artificial IntelligenceJan-9-2023

Policy gradient methods are a class of reinforcement learning (RL) algorithms that attempt to directly maximize the expected performance of an agent's policy by following the gradient of an objective function (Sutton et al., 2000), typically the expected sum of rewards, using a stochastic estimator generated by interacting with the environment. Unbiased estimators of this gradient can suffer from high variance due to high variance in the sum of future rewards. A common approach is to instead consider an exponentially discounted sum of future rewards. This approach reduces the variance of most estimators but introduces bias (Thomas, 2014). Frequently, the discounted sum of future rewards is estimated by a critic (Konda and Tsitsiklis, 2000). It has been argued that when a critic is used, discounting has the additional benefit of reducing approximation error (Zhang et al., 2020). The "discounted" policy gradient was originally introduced as the gradient of a discounted objective (Sutton et al., 2000). However, it has been shown that the gradient of the discounted objective does not produce the update direction followed by most discounted policy gradient algorithms (Thomas, 2014; Nota and Thomas, 2019).

approximation, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2212.14066

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Concentration Bound for Distributed Stochastic Approximation

Dolhare, Harsh, Borkar, Vivek

arXiv.org Artificial IntelligenceOct-9-2022

We revisit the classical model of Tsitsiklis, Bertsekas and Athans for distributed stochastic approximation with consensus. The main result is an analysis of this scheme using the ODE approach to stochastic approximation, leading to a high probability bound for the tracking error between suitably interpolated iterates and the limiting differential equation. Several future directions will also be highlighted.

artificial intelligence, concentration bound, stochastic approximation, (14 more...)

arXiv.org Artificial Intelligence

2210.04253

Country:

Asia > India > NCT > New Delhi (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

Average-Reward Reinforcement Learning with Trust Region Methods

Ma, Xiaoteng, Tang, Xiaohang, Xia, Li, Yang, Jun, Zhao, Qianchuan

arXiv.org Artificial IntelligenceJun-7-2021

Most of reinforcement learning algorithms optimize the discounted criterion which is beneficial to accelerate the convergence and reduce the variance of estimates. Although the discounted criterion is appropriate for certain tasks such as financial related problems, many engineering problems treat future rewards equally and prefer a long-run average criterion. In this paper, we study the reinforcement learning problem with the long-run average criterion. Firstly, we develop a unified trust region theory with discounted and average criteria. With the average criterion, a novel performance bound within the trust region is derived with the Perturbation Analysis (PA) theory. Secondly, we propose a practical algorithm named Average Policy Optimization (APO), which improves the value estimation with a novel technique named Average Value Constraint. To the best of our knowledge, our work is the first one to study the trust region approach with the average criterion and it complements the framework of reinforcement learning beyond the discounted criterion. Finally, experiments are conducted in the continuous control environment MuJoCo. In most tasks, APO performs better than the discounted PPO, which demonstrates the effectiveness of our approach.

algorithm, average criterion, criterion, (15 more...)

arXiv.org Artificial Intelligence

2106.03442

Country:

North America > United States > New Jersey (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER

Holzleitner, Markus, Gruber, Lukas, Arjona-Medina, José, Brandstetter, Johannes, Hochreiter, Sepp

arXiv.org Artificial IntelligenceDec-2-2020

We prove under commonly used assumptions the convergence of actor-critic reinforcement learning algorithms, which simultaneously learn a policy function, the actor, and a value function, the critic. Both functions can be deep neural networks of arbitrary complexity. Our framework allows showing convergence of the well known Proximal Policy Optimization (PPO) and of the recently introduced RUDDER. For the convergence proof we employ recently introduced techniques from the two time-scale stochastic approximation theory. Our results are valid for actor-critic methods that use episodic samples and that have a policy that becomes more greedy during learning. Previous convergence proofs assume linear function approximation, cannot treat episodic examples, or do not consider that policies become greedy. The latter is relevant since optimal policies are typically deterministic.

assumption, convergence, optimal policy, (14 more...)

arXiv.org Artificial Intelligence

2012.01399

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(4 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Average-reward model-free reinforcement learning: a systematic review and literature mapping

Dewanto, Vektor, Dunn, George, Eshragh, Ali, Gallagher, Marcus, Roosta, Fred

arXiv.org Artificial IntelligenceOct-18-2020

Model-free reinforcement learning (RL) has been an active area of research and provides a fundamental framework for agent-based learning and decision-making in artificial intelligence. In this paper, we review a specific subset of this literature, namely work that utilizes optimization criteria based on average rewards, in the infinite horizon setting. Average reward RL has the advantage of being the most selective criterion in recurrent (ergodic) Markov decision processes. In comparison to widely-used discounted reward criterion, it also requires no discount factor, which is a critical hyperparameter, and properly aligns the optimization and performance metrics. Motivated by the solo survey by Mahadevan (1996a), we provide an updated review of work in this area and extend it to cover policy-iteration and function approximation methods (in addition to the value-iteration and tabular counterparts). We also identify and discuss opportunities for future work.

approximation, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2010.0892

Country: