Goto

Collaborating Authors

 total error


Rethinking gradient sparsification as total error minimization

Neural Information Processing Systems

Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training of large deep neural networks (DNNs). Under the error-feedback framework, Top-$k$ sparsification, sometimes with $k$ as little as 0.1% of the gradient size, enables training to the same model quality as the uncompressed case for a similar iteration count. From the optimization perspective, we find that Top-$k$ is the communication-optimal sparsifier given a per-iteration $k$ element budget.We argue that to further the benefits of gradient sparsification, especially for DNNs, a different perspective is necessary -- one that moves from per-iteration optimality to consider optimality for the entire training.We identify that the total error -- the sum of the compression errors for all iterations -- encapsulates sparsification throughout training. Then, we propose a communication complexity model that minimizes the total error under a communication budget for the entire training. We find that the hard-threshold sparsifier, a variant of the Top-$k$ sparsifier with $k$ determined by a constant hard-threshold, is the optimal sparsifier for this model. Motivated by this, we provide convex and non-convex convergence analyses for the hard-threshold sparsifier with error-feedback. We show that hard-threshold has the same asymptotic convergence and linear speedup property as SGD in both the case, and unlike with Top-$k$ sparsifier, has no impact due to data-heterogeneity. Our diverse experiments on various DNNs and a logistic regression model demonstrate that the hard-threshold sparsifier is more communication-efficient than Top-$k$.


Optimality and Stability in Federated Learning: A Game-theoretic Approach

Neural Information Processing Systems

Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players).



Optimality and Stability in Federated Learning: A Game-theoretic Approach

Neural Information Processing Systems

Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players).


Exact and approximate error bounds for physics-informed neural networks

arXiv.org Artificial Intelligence

The use of neural networks to solve differential equations, as an alternative to traditional numerical solvers, has increased recently. However, error bounds for the obtained solutions have only been developed for certain equations. In this work, we report important progress in calculating error bounds of physics-informed neural networks (PINNs) solutions of nonlinear first-order ODEs. We give a general expression that describes the error of the solution that the PINN-based method provides for a nonlinear first-order ODE. In addition, we propose a technique to calculate an approximate bound for the general case and an exact bound for a particular case. The error bounds are computed using only the residual information and the equation structure. We apply the proposed methods to particular cases and show that they can successfully provide error bounds without relying on the numerical solution.


Error Bounds for Deep Learning-based Uncertainty Propagation in SDEs

arXiv.org Artificial Intelligence

Stochastic differential equations are commonly used to describe the evolution of stochastic processes. The uncertainty of such processes is best represented by the probability density function (PDF), whose evolution is governed by the Fokker-Planck partial differential equation (FP-PDE). However, it is generally infeasible to solve the FP-PDE in closed form. In this work, we show that physics-informed neural networks (PINNs) can be trained to approximate the solution PDF using existing methods. The main contribution is the analysis of the approximation error: we develop a theory to construct an arbitrary tight error bound with PINNs. In addition, we derive a practical error bound that can be efficiently constructed with existing training methods. Finally, we explain that this error-bound theory generalizes to approximate solutions of other linear PDEs. Several numerical experiments are conducted to demonstrate and validate the proposed methods.


Optimality and Stability in Federated Learning: A Game-theoretic Approach

Neural Information Processing Systems

Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players).


Error Estimation for Physics-informed Neural Networks Approximating Semilinear Wave Equations

arXiv.org Artificial Intelligence

Solving these equations analytically is often challenging or even impossible, necessitating the utilization of other methods to obtain approximate solutions. One way to find approximate solutions to partial differential equations is through classical numerical methods. These methods have been studied for years and already have strong theoretical foundations when it comes to error estimation [1]. However, in recent years, with the rise of machine learning as a whole, there has also been an increased interest in applying machine learning methods to the problem of finding approximate solutions to PDEs. As universal function approximators [2], deep neural networks provide a promising avenue for a multitude of approaches to the approximation of solutions to partial differential equations. Among these methods are neural operators, methods based on the Feynman-Kac formula, and methods for parametric PDEs [3] [4] [5]. This paper focuses on physics-informed neural networks (PINNs), which were conceived as feed-forward neural networks that incorporate the dynamics of the PDE into their loss function [6].


Meta-learning to Calibrate Gaussian Processes with Deep Kernels for Regression Uncertainty Estimation

arXiv.org Machine Learning

Although Gaussian processes (GPs) with deep kernels have been successfully used for meta-learning in regression tasks, its uncertainty estimation performance can be poor. We propose a meta-learning method for calibrating deep kernel GPs for improving regression uncertainty estimation performance with a limited number of training data. The proposed method meta-learns how to calibrate uncertainty using data from various tasks by minimizing the test expected calibration error, and uses the knowledge for unseen tasks. We design our model such that the adaptation and calibration for each task can be performed without iterative procedures, which enables effective meta-learning. In particular, a task-specific uncalibrated output distribution is modeled by a GP with a task-shared encoder network, and it is transformed to a calibrated one using a cumulative density function of a task-specific Gaussian mixture model (GMM). By integrating the GP and GMM into our neural network-based model, we can meta-learn model parameters in an end-to-end fashion. Our experiments demonstrate that the proposed method improves uncertainty estimation performance while keeping high regression performance compared with the existing methods using real-world datasets in few-shot settings.


PINNs error estimates for nonlinear equations in $\mathbb{R}$-smooth Banach spaces

arXiv.org Artificial Intelligence

In 2017, M. Raissi et al. introduced the Physics-informed neural network (PINN) approximating solutions to partial differential equations (PDEs) [29, 30]. It reduces losses related to PDE and boundary/initial conditions. In recent years, the number of papers dedicated to deep learning methods for solving PDEs, including PINNs, is constantly increasing (see, for instance, [9, 20, 24, 21] for deep learning methods and [14, 16, 17, 18, 31, 32] for PINN). Consequently, a thorough exploration of the theoretical aspects associated with PINNs is of great significance. For instance, the question arises as to why PINN's training algorithm leads us to an accurate approximation. In other words, is it possible to control total error for sufficiently small residuals/training error? In [25], S. Mishra and R. Molinaro presented an error estimation answering this question and offered an operator description of the sufficient conditions for applying such a method. Yet, these conditions, while initially outlined rather generally, in practice, require obtaining the estimate itself to verify them.