AITopics | Mahajan, Aditya

Collaborating Authors

Mahajan, Aditya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robustness and sample complexity of model-based MARL for general-sum Markov games

Subramanian, Jayakumar, Sinha, Amit, Mahajan, Aditya

arXiv.org Artificial IntelligenceDec-19-2022

Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best-response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of \emph{sample complexity} for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality based bounds to show that $\tilde{\mathcal{O}}( (1-\gamma)^{-4} \alpha^{-2})$ samples per state-action pair are sufficient to obtain a $\alpha$-approximate Markov perfect equilibrium with high probability, where $\gamma$ is the discount factor, and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms. We then use Bernstein inequality based bounds to show that $\tilde{\mathcal{O}}( (1-\gamma)^{-1} \alpha^{-2} )$ samples are sufficient. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and provide explicit bounds on the approximation error. We illustrate the results via a numerical example.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2110.02355

Country: North America > United States > California (0.45)

Genre: Overview (0.92)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Switched Linear Systems

Sayedana, Borna, Afshari, Mohammad, Caines, Peter E., Mahajan, Aditya

arXiv.org Machine LearningDec-20-2021

In this paper, we investigate the problem of system identification for autonomous switched linear systems with complete state observations. We propose switched least squares method for the identification for switched linear systems, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-dependent rate of convergence shows that, almost surely, the system identification error is $\mathcal{O}\big(\sqrt{\log(T)/T} \big)$ where $T$ is the time horizon. These results show that our method for switched linear systems has the same rate of convergence as least squares method for non-switched linear systems. We compare our results with those in the literature. We present numerical examples to illustrate the performance of the proposed system identification method.

artificial intelligence, convergence, machine learning, (15 more...)

arXiv.org Machine Learning

2112.10753

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

Gagrani, Mukul, Sudhakara, Sagar, Mahajan, Aditya, Nayyar, Ashutosh, Ouyang, Yi

arXiv.org Artificial IntelligenceAug-19-2021

We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of $\tilde{\mathcal{O}}(\sqrt{T})$, where $T$ is the time-horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in~$T$.

artificial intelligence, assumption, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2108.08502

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Add feedback

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

Sudhakara, Sagar, Mahajan, Aditya, Nayyar, Ashutosh, Ouyang, Yi

arXiv.org Artificial IntelligenceAug-18-2021

We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $\tilde{\mathcal{O}} \big( n \sqrt{T} \big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in $n$ and $T$. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm.

artificial intelligence, machine learning, subsystem, (16 more...)

arXiv.org Artificial Intelligence

2108.0797

Country:

Asia (1.00)
North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Renewal Monte Carlo: Renewal theory based reinforcement learning

Subramanian, Jayakumar, Mahajan, Aditya

arXiv.org Machine LearningApr-3-2018

In this paper, we present an online reinforcement learning algorithm, called Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm and retains the advantages of Monte Carlo methods including low bias, simplicity, and ease of implementation while, at the same time, circumvents their key drawbacks of high variance and delayed (end of episode) updates. The key ideas behind RMC are as follows. First, under any reasonable policy, the reward process is ergodic. So, by renewal theory, the performance of a policy is equal to the ratio of expected discounted reward to the expected discounted time over a regenerative cycle. Second, by carefully examining the expression for performance gradient, we propose a stochastic approximation algorithm that only requires estimates of the expected discounted reward and discounted time over a regenerative cycle and their gradients. We propose two unbiased estimators for evaluating performance gradients---a likelihood ratio based estimator and a simultaneous perturbation based estimator---and show that for both estimators, RMC converges to a locally optimal policy. We generalize the RMC algorithm to post-decision state models and also present a variant that converges faster to an approximately optimal policy. We conclude by presenting numerical experiments on a randomly generated MDP, event-triggered communication, and inventory management.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1804.01116

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback