Goto

Collaborating Authors

 Gradient Descent


Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis

arXiv.org Artificial Intelligence

A wireless federated learning system is investigated by allowing a server and workers to exchange uncoded information via orthogonal wireless channels. Since the workers frequently upload local gradients to the server via bandwidth-limited channels, the uplink transmission from the workers to the server becomes a communication bottleneck. Therefore, a one-shot distributed principle component analysis (PCA) is leveraged to reduce the dimension of uploaded gradients such that the communication bottleneck is relieved. A PCA-based wireless federated learning (PCA-WFL) algorithm and its accelerated version (i.e., PCA-AWFL) are proposed based on the low-dimensional gradients and the Nesterov's momentum. For the non-convex loss functions, a finite-time analysis is performed to quantify the impacts of system hyper-parameters on the convergence of the PCA-WFL and PCA-AWFL algorithms. The PCA-AWFL algorithm is theoretically certified to converge faster than the PCA-WFL algorithm. Besides, the convergence rates of PCA-WFL and PCA-AWFL algorithms quantitatively reveal the linear speedup with respect to the number of workers over the vanilla gradient descent algorithm. Numerical results are used to demonstrate the improved convergence rates of the proposed PCA-WFL and PCA-AWFL algorithms over the benchmarks.


Surrogate Neural Networks for Efficient Simulation-based Trajectory Planning Optimization

arXiv.org Artificial Intelligence

This paper presents a novel methodology that uses surrogate models in the form of neural networks to reduce the computation time of simulation-based optimization of a reference trajectory. Simulation-based optimization is necessary when there is no analytical form of the system accessible, only input-output data that can be used to create a surrogate model of the simulation. Like many high-fidelity simulations, this trajectory planning simulation is very nonlinear and computationally expensive, making it challenging to optimize iteratively. Through gradient descent optimization, our approach finds the optimal reference trajectory for landing a hypersonic vehicle. In contrast to the large datasets used to create the surrogate models in prior literature, our methodology is specifically designed to minimize the number of simulation executions required by the gradient descent optimizer. We demonstrated this methodology to be more efficient than the standard practice of hand-tuning the inputs through trial-and-error or randomly sampling the input parameter space. Due to the intelligently selected input values to the simulation, our approach yields better simulation outcomes that are achieved more rapidly and to a higher degree of accuracy. Optimizing the hypersonic vehicle's reference trajectory is very challenging due to the simulation's extreme nonlinearity, but even so, this novel approach found a 74% better-performing reference trajectory compared to nominal, and the numerical results clearly show a substantial reduction in computation time for designing future trajectories.


Sublinear Convergence Rates of Extragradient-Type Methods: A Survey on Classical and Recent Developments

arXiv.org Machine Learning

The generalized equation (also called the [non]linear inclusion) provides a unified template to model various problems in computational mathematics and related fields su ch as the optimality condition of optimization problems (in both unconstrained and constrained settings), minimax optimization, variational inequality, complementarity, two-person game, and fixed-point problem s, see, e.g., [11, 24, 50, 112, 116, 118, 120]. Theory and numerical methods for this equation and its special case s have been extensively studied for many decades, see, e.g., the following monographs and the references quot ed therein [11, 50, 94, 119]. At the same time, several applications of this mathematical tool in operatio ns research, economics, uncertainty quantification, and transportations have been investigated [14, 52, 61, 50, 72]. In the last few years, there has been a surge of research in minimax problems due to new applications in mach ine learning and robust optimization, especially in generative adversarial networks (GANs), adversarial tr aining, and distributionally robust optimization, see, e.g., [4, 14, 55, 76, 84, 114] as a few examples. Minimax probl ems have also found new applications in online learning and reinforcement learning, among many others, se e, e.g., [4, 9, 15, 55, 67, 76, 78, 84, 114, 139]. Such prominent applications have motivated the research in minimax optimization and variational inequality problems (VIPs). On the one hand, classical algorithms such as gradient descent-ascent, extragradient, and primal-dual methods have been revisited, improved, and ext ended. On the other hand, new variants such as accelerated extragradient and accelerated operator split ting schemes have also been developed and equipped with rigorous convergence guarantees and practical perfor mance evaluation. This new development motivates us to write this survey paper, with the focus on sublinear con vergence rate analysis.


Meta-Learning Parameterized First-Order Optimizers using Differentiable Convex Optimization

arXiv.org Artificial Intelligence

Conventional optimization methods in machine learning and controls rely heavily on first-order update rules. Selecting the right method and hyperparameters for a particular task often involves trial-and-error or practitioner intuition, motivating the field of meta-learning. We generalize a broad family of preexisting update rules by proposing a meta-learning framework in which the inner loop optimization step involves solving a differentiable convex optimization (DCO). We illustrate the theoretical appeal of this approach by showing that it enables one-step optimization of a family of linear least squares problems, given that the meta-learner has sufficient exposure to similar tasks. Various instantiations of the DCO update rule are compared to conventional optimizers on a range of illustrative experimental settings.


Importance Sampling for Stochastic Gradient Descent in Deep Neural Networks

arXiv.org Artificial Intelligence

Stochastic gradient descent samples uniformly the training set to build an unbiased gradient estimate with a limited number of samples. However, at a given step of the training process, some data are more helpful than others to continue learning. Importance sampling for training deep neural networks has been widely studied to propose sampling schemes yielding better performance than the uniform sampling scheme. After recalling the theory of importance sampling for deep learning, this paper reviews the challenges inherent to this research area. In particular, we propose a metric allowing the assessment of the quality of a given sampling scheme; and we study the interplay between the sampling scheme and the optimizer used.


Have it your way: Individualized Privacy Assignment for DP-SGD

arXiv.org Artificial Intelligence

This budget represents a maximal privacy violation that any user is willing to face by contributing their data to the training set. We argue that this approach is limited because different users may have different privacy expectations. Thus, setting a uniform privacy budget across all points may be overly conservative for some users or, conversely, not sufficiently protective for others. In this paper, we capture these preferences through individualized privacy budgets. To demonstrate their practicality, we introduce a variant of Differentially Private Stochastic Gradient Descent (DP-SGD) which supports such individualized budgets. DP-SGD is the canonical approach to training models with differential privacy. We modify its data sampling and gradient noising mechanisms to arrive at our approach, which we call Individualized DP-SGD (IDP-SGD). Because IDP-SGD provides privacy guarantees tailored to the preferences of individual users and their data points, we find it empirically improves privacy-utility trade-offs.


Back To Basics, Part Uno: Linear Regression and Cost Function

#artificialintelligence

These concepts form the foundation of many machine learning algorithms. Initially, I decided against writing an article on these topics because they are so widely covered. However, I have changed my mind because understanding these concepts is essential for understanding more advanced topics like Neural Networks (that I plan on tackling in the near future). In addition, this series will be divided into two parts to make it more manageable and organized for better understanding. So make yourself comfortable, grab a cup of coffee, and get ready to embark on a magical journey of machine learning. As with any machine learning problem, we begin with a specific question we want to answer.


Deep Learning: Artificial Neural Network

#artificialintelligence

The error is calculated on one and only on a single data point. Divided by N is excluded because of a single data point. This gradient descent will have more accuracy than stochastic gradient descent as stochastic gradient descent use only one data point for error calculation.


Global Convergence of Over-parameterized Deep Equilibrium Models

arXiv.org Artificial Intelligence

A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.


DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

arXiv.org Artificial Intelligence

Robotic grasping of 3D deformable objects is critical for real-world applications such as food handling and robotic surgery. Unlike rigid and articulated objects, 3D deformable objects have infinite degrees of freedom. Fully defining their state requires 3D deformation and stress fields, which are exceptionally difficult to analytically compute or experimentally measure. Thus, evaluating grasp candidates for grasp planning typically requires accurate, but slow 3D finite element method (FEM) simulation. Sampling-based grasp planning is often impractical, as it requires evaluation of a large number of grasp candidates. Gradient-based grasp planning can be more efficient, but requires a differentiable model to synthesize optimal grasps from initial candidates. Differentiable FEM simulators may fill this role, but are typically no faster than standard FEM. In this work, we propose learning a predictive graph neural network (GNN), DefGraspNets, to act as our differentiable model. We train DefGraspNets to predict 3D stress and deformation fields based on FEM-based grasp simulations. DefGraspNets not only runs up to 1500 times faster than the FEM simulator, but also enables fast gradient-based grasp optimization over 3D stress and deformation metrics. We design DefGraspNets to align with real-world grasp planning practices and demonstrate generalization across multiple test sets, including real-world experiments.