Ramaswamy, Arunselvan

In this paper, we present an asynchronous approximate gradient method that is easy to implement called DSPG (Decentralized Simultaneous Perturbation Stochastic Approximations, with Constant Sensitivity Parameters). It is obtained by modifying SPSA (Simultaneous Perturbation Stochastic Approximations) to allow for decentralized optimization in multi-agent learning and distributed control scenarios. SPSA is a popular approximate gradient method developed by Spall, that is used in Robotics and Learning. In the multi-agent learning setup considered herein, the agents are assumed to be asynchronous (agents abide by their local clocks) and communicate via a wireless medium, that is prone to losses and delays. We analyze the gradient estimation bias that arises from setting the sensitivity parameters to a single value, and the bias that arises from communication losses and delays. Specifically, we show that these biases can be countered through better and frequent communication and/or by choosing a small fixed value for the sensitivity parameters. We also discuss the variance of the gradient estimator and its effect on the rate of convergence. Finally, we present numerical results supporting DSPG and the aforementioned theories and discussions.

Koenig, Jonas, Malberg, Simon, Martens, Martin, Niehaus, Sebastian, Krohn-Grimberghe, Artus, Ramaswamy, Arunselvan

We present a reinforcement learning approach for detecting objects within an image. Our approach performs a step-wise deformation of a bounding box with the goal of tightly framing the object. It uses a hierarchical tree-like representation of predefined region candidates, which the agent can zoom in on. This reduces the number of region candidates that must be evaluated so that the agent can afford to compute new feature maps before each step to enhance detection quality. We compare an approach that is based purely on zoom actions with one that is extended by a second refinement stage to fine-tune the bounding box after each zoom step. We also improve the fitting ability by allowing for different aspect ratios of the bounding box. Finally, we propose different reward functions to lead to a better guidance of the agent while following its search trajectories. Experiments indicate that each of these extensions leads to more correct detections. The best performing approach comprises a zoom stage and a refinement stage, uses aspect-ratio modifying actions and is trained using a combination of three different reward metrics.

Demirel, Burak, Ramaswamy, Arunselvan, Quevedo, Daniel E., Karl, Holger

We consider networked control systems consisting of multiple independent controlled subsystems, operating over a shared communication network. Such systems are ubiquitous in cyber-physical systems, Internet of Things, and large-scale industrial systems. In many large-scale settings, the size of the communication network is smaller than the size of the system. In consequence, scheduling issues arise. The main contribution of this paper is to develop a deep reinforcement learning-based \emph{control-aware} scheduling (\textsc{DeepCAS}) algorithm to tackle these issues. We use the following (optimal) design strategy: First, we synthesize an optimal controller for each subsystem; next, we design a learning algorithm that adapts to the chosen subsystems (plants) and controllers. As a consequence of this adaptation, our algorithm finds a schedule that minimizes the \emph{control loss}. We present empirical results to show that \textsc{DeepCAS} finds schedules with better performance than periodic ones.

Ramaswamy, Arunselvan, Bhatnagar, Shalabh, Quevedo, Daniel E.

Asynchronous stochastic approximations are an important class of model-free algorithms that are readily applicable to multi-agent reinforcement learning (RL) and distributed control applications. When the system size is large, the aforementioned algorithms are used in conjunction with function approximations. In this paper, we present a complete analysis, including stability (almost sure boundedness) and convergence, of asynchronous stochastic approximations with asymptotically bounded biased errors, under easily verifiable sufficient conditions. As an application, we analyze the Policy Gradient algorithms and the more general Value Iteration based algorithms with noise. These are popular reinforcement learning algorithms due to their simplicity and effectiveness. Specifically, we analyze the asynchronous approximate counterpart of policy gradient (A2PG) and value iteration (A2VI) schemes. It is shown that the stability of these algorithms remains unaffected when the approximation errors are guaranteed to be asymptotically bounded, although possibly biased. Regarding convergence of A2VI, it is shown to converge to a fixed point of the perturbed Bellman operator when balanced step-sizes are used. Further, a relationship between these fixed points and the approximation errors is established. A similar analysis for A2PG is also presented.

Ramaswamy, Arunselvan, Bhatnagar, Shalabh

The main aim of this paper is the development of easily verifiable sufficient conditions for stability (almost sure boundedness) and convergence of stochastic approximation algorithms (SAAs) with set-valued mean-fields, a class of model-free algorithms that have become important in recent times. In this paper we provide a complete analysis of such algorithms under three different, yet related sets of sufficient conditions, based on the existence of an associated global/local Lyapunov function. Unlike previous Lyapunov function based approaches, we provide a simple recipe for explicitly constructing the Lyapunov function, needed for analysis. Our work builds on the works of Abounadi, Bertsekas and Borkar (2002), Munos (2005), and Ramaswamy and Bhatnagar (2016). An important motivation for the flavor of our assumptions comes from the need to understand dynamic programming and reinforcement learning algorithms, that use deep neural networks (DNNs) for function approximations and parameterizations. These algorithms are popularly known as deep learning algorithms. As an important application of our theory, we provide a complete analysis of the stochastic approximation counterpart of approximate value iteration (AVI), an important dynamic programming method designed to tackle Bellman's curse of dimensionality. Further, the assumptions involved are significantly weaker, easily verifiable and truly model-free. The theory presented in this paper is also used to develop and analyze the first SAA for finding fixed points of contractive set-valued maps.

Ramaswamy, Arunselvan, Bhatnagar, Shalabh

The main aim of this paper is to provide an analysis of gradient descent (GD) algorithms with gradient errors that do not necessarily vanish, asymptotically. In particular, sufficient conditions are presented for both stability (almost sure boundedness of the iterates) and convergence of GD with bounded, (possibly) non-diminishing gradient errors. In addition to ensuring stability, such an algorithm is shown to converge to a small neighborhood of the minimum set, which depends on the gradient errors. It is worth noting that the main result of this paper can be used to show that GD with asymptotically vanishing errors indeed converges to the minimum set. The results presented herein are not only more general when compared to previous results, but our analysis of GD with errors is new to the literature to the best of our knowledge. Our work extends the contributions of Mangasarian & Solodov, Bertsekas & Tsitsiklis and Tadic & Doucet. Using our framework, a simple yet effective implementation of GD using simultaneous perturbation stochastic approximations (SP SA), with constant sensitivity parameters, is presented. Another important improvement over many previous results is that there are no `additional' restrictions imposed on the step-sizes. In machine learning applications where step-sizes are related to learning rates, our assumptions, unlike those of other papers, do not affect these learning rates. Finally, we present experimental results to validate our theory.

Ramaswamy, Arunselvan, Bhatnagar, Shalabh

In this paper the stability theorem of Borkar and Meyn is extended to include the case when the mean field is a differential inclusion. Two different sets of sufficient conditions are presented that guarantee the stability and convergence of stochastic recursive inclusions. Our work builds on the works of Benaim, Hofbauer and Sorin as well as Borkar and Meyn. As a corollary to one of the main theorems, a natural generalization of the Borkar and Meyn Theorem follows. In addition, the original theorem of Borkar and Meyn is shown to hold under slightly relaxed assumptions. Finally, as an application to one of the main theorems we discuss a solution to the approximate drift problem.

Ramaswamy, Arunselvan, Bhatnagar, Shalabh

In this paper we present a framework to analyze the asymptotic behavior of two timescale stochastic approximation algorithms including those with set-valued mean fields. This paper builds on the works of Borkar and Perkins & Leslie. The framework presented herein is more general as compared to the synchronous two timescale framework of Perkins \& Leslie, however the assumptions involved are easily verifiable. As an application, we use this framework to analyze the two timescale stochastic approximation algorithm corresponding to the Lagrangian dual problem in optimization theory.

Ramaswamy, Arunselvan, Bhatnagar, Shalabh

In this paper we present a `stability theorem' for stochastic approximation (SA) algorithms with `controlled Markov' noise. Such algorithms were first studied by Borkar in 2006. Specifically, sufficient conditions are presented which guarantee the stability of the iterates. Further, under these conditions the iterates are shown to track a solution to the differential inclusion defined in terms of the ergodic occupation measures associated with the `controlled Markov' process. As an application to our main result we present an improvement to a general form of temporal difference learning algorithms. Specifically, we present sufficient conditions for their stability and convergence using our framework. This paper builds on the works of Borkar as well as Benveniste, Metivier and Priouret.