Jovanović, Mihailo R.
Stability properties of gradient flow dynamics for the symmetric low-rank matrix factorization problem
Mohammadi, Hesameddin, Tinati, Mohammad, Tu, Stephen, Soltanolkotabi, Mahdi, Jovanović, Mihailo R.
The symmetric low-rank matrix factorization serves as a building block in many learning tasks, including matrix recovery and training of neural networks. However, despite a flurry of recent research, the dynamics of its training via non-convex factorized gradient-descent-type methods is not fully understood especially in the over-parameterized regime where the fitted rank is higher than the true rank of the target matrix. To overcome this challenge, we characterize equilibrium points of the gradient flow dynamics and examine their local and global stability properties. To facilitate a precise global analysis, we introduce a nonlinear change of variables that brings the dynamics into a cascade connection of three subsystems whose structure is simpler than the structure of the original system. We demonstrate that the Schur complement to a principal eigenspace of the target matrix is governed by an autonomous system that is decoupled from the rest of the dynamics. In the over-parameterized regime, we show that this Schur complement vanishes at an $O(1/t)$ rate, thereby capturing the slow dynamics that arises from excess parameters. We utilize a Lyapunov-based approach to establish exponential convergence of the other two subsystems. By decoupling the fast and slow parts of the dynamics, we offer new insight into the shape of the trajectories associated with local search algorithms and provide a complete characterization of the equilibrium points and their global stability properties. Such an analysis via nonlinear control techniques may prove useful in several related over-parameterized problems.
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs
Ding, Dongsheng, Zhang, Kaiqing, Duan, Jiali, Başar, Tamer, Jovanović, Mihailo R.
We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finitesample complexity guarantees for two sample-based NPG-PD algorithms. Finally, we use computational experiments to showcase the merits and the effectiveness of our approach. Keywords: Constrained Markov decision processes; Natural policy gradient; Constrained nonconvex optimization; Method of Lagrange multipliers; Primal-dual algorithms.
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning
Ding, Dongsheng, Wei, Xiaohan, Yang, Zhuoran, Wang, Zhaoran, Jovanović, Mihailo R.
We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic utility functions. For such a Markov game, we employ an approach based on the occupancy measure to formulate it as an online constrained saddle-point problem with an explicit constraint. We extend the Lagrange multiplier method in constrained optimization to handle the constraint by creating a generalized Lagrangian with minimax decision primal variables and a dual variable. Next, we develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem while balancing exploration and exploitation. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step and we prove that it enjoys sublinear rate $ O((|X|+|Y|) L \sqrt{T(|A|+|B|)}))$ for both regret and constraint violation after playing $T$ episodes of the game. Here, $L$ is the horizon of each episode, $(|X|,|A|)$ and $(|Y|,|B|)$ are the state/action space sizes of the min-player and the max-player, respectively. To the best of our knowledge, we provide the first provably efficient online safe reinforcement learning algorithm in constrained Markov games.
Can Decentralized Control Outperform Centralized? The Role of Communication Latency
Ballotta, Luca, Jovanović, Mihailo R., Schenato, Luca
In this paper, we examine the influence of communication latency on performance of networked control systems. Even though distributed control architectures offer advantages in terms of communication, maintenance costs, and scalability, it is an open question how communication latency that varies with network topology influences closed-loop performance. For networks in which delays increase with the number of links, we establish the existence of a fundamental performance trade-off that arises from control architecture. In particular, we utilize consensus dynamics with single- and double-integrator agents to show that, if delays increase fast enough, a sparse controller with nearest neighbor interactions can outperform the centralized one with all-to-all communication topology.
Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence
Ding, Dongsheng, Wei, Chen-Yu, Zhang, Kaiqing, Jovanović, Mihailo R.
We examine global non-asymptotic convergence properties of policy gradient methods for multi-agent reinforcement learning (RL) problems in Markov potential games (MPG). To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $\epsilon$-Nash equilibrium with $O(1/\epsilon^2)$ iteration complexity which does not explicitly depend on the state space size. When the exact gradient is not available, we establish $O(1/\epsilon^5)$ sample complexity bound in a potentially infinitely large state space for a sample-based algorithm that utilizes function approximation. Moreover, we identify a class of independent policy gradient algorithms that enjoys convergence for both zero-sum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played. Finally, we provide computational experiments to corroborate the merits and the effectiveness of our theoretical developments.
A second order primal-dual method for nonsmooth convex composite optimization
Dhingra, Neil K., Khong, Sei Zhen, Jovanović, Mihailo R.
We develop a second order primal-dual method for optimization problems in which the objective function is given by the sum of a strongly convex twice differentiable term and a possibly nondifferentiable convex regularizer. After introducing an auxiliary variable, we utilize the proximal operator of the nonsmooth regularizer to transform the associated augmented Lagrangian into a function that is once, but not twice, continuously differentiable. The saddle point of this function corresponds to the solution of the original optimization problem. We employ a generalization of the Hessian to define second order updates on this function and prove global exponential stability of the corresponding differential inclusion. Furthermore, we develop a globally convergent customized algorithm that utilizes the primal-dual augmented Lagrangian as a merit function. We show that the search direction can be computed efficiently and prove quadratic/superlinear asymptotic convergence. We use the $\ell_1$-regularized model predictive control problem and the problem of designing a distributed controller for a spatially-invariant system to demonstrate the merits and the effectiveness of our method.
Robustness of accelerated first-order algorithms for strongly convex optimization problems
Mohammadi, Hesameddin, Razaviyayn, Meisam, Jovanović, Mihailo R.
We study the robustness of accelerated first-order algorithms to stochastic uncertainties in gradient evaluation. Specifically, for unconstrained, smooth, strongly convex optimization problems, we examine the mean-square error in the optimization variable when the iterates are perturbed by additive white noise. This type of uncertainty may arise in situations where an approximation of the gradient is sought through measurements of a real system or in a distributed computation over network. Even though the underlying dynamics of first-order algorithms for this class of problems are nonlinear, we establish upper bounds on the mean-square deviation from the optimal value that are tight up to constant factors. Our analysis quantifies fundamental trade-offs between noise amplification and convergence rates obtained via any acceleration scheme similar to Nesterov's or heavy-ball methods. To gain additional analytical insight, for strongly convex quadratic problems we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that the entire spectrum of the Hessian, rather than just the extreme eigenvalues, influence robustness of noisy algorithms. We specialize this result to the problem of distributed averaging over undirected networks and examine the role of network size and topology on the robustness of noisy accelerated algorithms.
Proximal algorithms for large-scale statistical modeling and optimal sensor/actuator selection
Zare, Armin, Mohammadi, Hesameddin, Dhingra, Neil K., Jovanović, Mihailo R., Georgiou, Tryphon T.
Several problems in modeling and control of stochastically-driven dynamical systems can be cast as regularized semi-definite programs. We examine two such representative problems and show that they can be formulated in a similar manner. The first, in statistical modeling, seeks to reconcile observed statistics by suitably and minimally perturbing prior dynamics. The second seeks to optimally select a subset of available sensors and actuators for control purposes. To address modeling and control of large-scale systems we develop a unified algorithmic framework using proximal methods. Our customized algorithms exploit problem structure and allow handling statistical modeling, as well as sensor and actuator selection, for substantially larger scales than what is amenable to current general-purpose solvers. We establish linear convergence of the proximal gradient algorithm, draw contrast between the proposed proximal algorithms and alternating direction method of multipliers, and provide examples that illustrate the merits and effectiveness of our framework. Index Terms Actuator selection, sensor selection, sparsity-promoting estimation and control, method of multipliers, nonsmooth convex optimization, proximal algorithms, regularization for design, semi-definite programming, structured covariances. I. INTRODUCTION Convex optimization has had tremendous impact on many disciplines, including system identification and control design [1]-[7]. The present paper focuses on two representative control problems, statistical control-oriented modeling and sensor/actuator selection, that are cast as convex programs.