AITopics | Devraj, Adithya M.

Collaborating Authors

Devraj, Adithya M.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gaussian Imagination in Bandit Learning

Liu, Yueyang, Devraj, Adithya M., Van Roy, Benjamin, Xu, Kuang

arXiv.org Machine LearningJan-5-2022

Assuming distributions are Gaussian often facilitates computations that are otherwise intractable. We consider an agent who is designed to attain a low information ratio with respect to a bandit environment with a Gaussian prior distribution and a Gaussian likelihood function, but study the agent's performance when applied instead to a Bernoulli bandit. We establish a bound on the increase in Bayesian regret when an agent interacts with the Bernoulli bandit, relative to an information-theoretic bound satisfied with the Gaussian bandit. If the Gaussian prior distribution and likelihood function are sufficiently diffuse, this increase grows with the square-root of the time horizon, and thus the per-timestep increase vanishes. Our results formalize the folklore that so-called Bayesian agents remain effective when instantiated with diffuse misspecified distributions.

agent, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2201.01902

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

Chen, Shuhang, Devraj, Adithya M., Bušić, Ana, Meyn, Sean

arXiv.org Machine LearningFeb-6-2020

This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design.

artificial intelligence, reinforcement learning, sequence, (17 more...)

arXiv.org Machine Learning

2002.02584

Country:

North America > United States (0.28)
Europe (0.28)
Asia > India (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Zap Q-Learning With Nonlinear Function Approximation

Chen, Shuhang, Devraj, Adithya M., Bušić, Ana, Meyn, Sean

arXiv.org Machine LearningOct-11-2019

The Zap stochastic approximation (SA) algorithm was introduced recently as a means to accelerate convergence in reinforcement learning algorithms. While numerical results were impressive, stability (in the sense of boundedness of parameter estimates) was established in only a few special cases. This class of algorithms is generalized in this paper, and stability is established under very general conditions. This general result can be applied to a wide range of algorithms found in reinforcement learning. Two classes are considered in this paper: (i)The natural generalization of Watkins' algorithm is not always stable in function approximation settings. Parameter estimates may diverge to infinity even in the \textit{linear} function approximation setting with a simple finite state-action MDP. Under mild conditions, the Zap SA algorithm provides a stable algorithm, even in the case of \textit{nonlinear} function approximation. (ii) The GQ algorithm of Maei et.~al.~2010 is designed to address the stability challenge. Analysis is provided to explain why the algorithm may be very slow to converge in practice. The new Zap GQ algorithm is stable even for nonlinear function approximation.

artificial intelligence, fuzzy logic, null, (20 more...)

arXiv.org Machine Learning

1910.05405

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Differential Temporal Difference Learning

Devraj, Adithya M., Kontoyiannis, Ioannis, Meyn, Sean P.

arXiv.org Machine LearningDec-28-2018

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to the associated Bellman equations is challenging in most practical cases of interest. A popular class of approximation techniques, known as Temporal Difference (TD) learning algorithms, are an important sub-class of general reinforcement learning methods. The algorithms introduced in this paper are intended to resolve two well-known difficulties of TD-learning approaches: Their slow convergence due to very high variance, and the fact that, for the problem of computing the relative value function, consistent algorithms exist only in special cases. First we show that the gradients of these value functions admit a representation that lends itself to algorithm design. Based on this result, a new class of differential TD-learning algorithms is introduced. For Markovian models on Euclidean space with smooth dynamics, the algorithms are shown to be consistent under general conditions. Numerical results show dramatic variance reduction when compared to standard methods.

algorithm, artificial intelligence, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1812.11137

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback