AITopics | Borkar, Vivek

Collaborating Authors

Borkar, Vivek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

Borkar, Vivek, Chen, Shuhang, Devraj, Adithya, Kontoyiannis, Ioannis, Meyn, Sean

arXiv.org Artificial IntelligenceJun-18-2023

The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ in which $\Phi$ is a geometrically ergodic Markov chain on a general state space $\textsf{X}$ with stationary distribution $\pi$, and $f:\Re^d\times\textsf{X}\to\Re^d$. The main results are established under a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3), and a stability condition for the mean flow with vector field $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$. (i) $\{ \theta_n\}$ is convergent a.s. and in $L_4$ to the unique root $\theta^*$ of $\bar{f}(\theta)$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. (iii) The CLT holds for the normalized version, $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^*)$, of the averaged parameters, $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the normalized covariance converges, $$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^*,\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^*$,} $$ where $\Sigma_\theta^*$ is the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain $\Phi$ is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded: $ \textsf{E} [ \| \theta_n \|^2 ] \to \infty$ as $n\to\infty$.

approximation, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2110.14427

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > Kansas (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Add feedback

Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

Pagare, Tejas, Borkar, Vivek, Avrachenkov, Konstantin

arXiv.org Artificial IntelligenceApr-7-2023

We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2304.03729

Country:

Asia > India (0.46)
North America > United States > Massachusetts (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.46)
Transportation > Electric Vehicle (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Add feedback

Concentration bounds for SSP Q-learning for average cost MDPs

Haque, Shaan Ul, Borkar, Vivek

arXiv.org Machine LearningJun-12-2022

We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.

artificial intelligence, average cost mdp, machine learning, (2 more...)

arXiv.org Machine Learning

2206.03328

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Structure-aware Online Learning Algorithm for Markov Decision Processes

Roy, Arghyadip, Borkar, Vivek, Karandikar, Abhay, Chaporkar, Prasanna

arXiv.org Machine LearningNov-28-2018

To overcome the curse of dimensionality and curse of modeling in Dynamic Programming (DP) methods for solving classical Markov Decision Process (MDP) problems, Reinforcement Learning (RL) algorithms are popular. In this paper, we consider an infinite-horizon average reward MDP problem and prove the optimality of the threshold policy under certain conditions. Traditional RL techniques do not exploit the threshold nature of optimal policy while learning. In this paper, we propose a new RL algorithm which utilizes the known threshold structure of the optimal policy while learning by reducing the feasible policy space. We establish that the proposed algorithm converges to the optimal policy. It provides a significant improvement in convergence speed and computational and storage complexity over traditional RL algorithms. The proposed technique can be applied to a wide variety of optimization problems that include energy efficient data transmission and management of queues. We exhibit the improvement in convergence speed of the proposed algorithm over other RL algorithms through simulations.

computer based training, educational technology, optimal policy, (22 more...)

arXiv.org Machine Learning

1811.11646

Country:

Europe (0.15)
North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Energy (0.46)
Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Add feedback