AITopics

2306.05189

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > California (0.04)
Europe > Russia (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Consumer Health (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Taheri, Hossein, Thrampoulidis, Christos

Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

arXiv.org Artificial IntelligenceJun-26-2023

Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data. In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum if the iterates find an interpolating model. This is made possible by showing certain gradient self-boundedness conditions and a log-Lipschitzness property. We also study generalization of normalized GD for convex objectives via an algorithmic-stability analysis. In particular, we show that normalized GD does not overfit during training by establishing finite-time generalization bounds.

artificial intelligence, machine learning, normalized gd, (17 more...)

2305.13471

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
North America > Canada > British Columbia (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Liang, Ziyi, Zhou, Yanfei, Sesia, Matteo

Conformal inference is (almost) free for neural networks trained with early stopping

arXiv.org Artificial IntelligenceJun-26-2023

Deep neural networks can detect complex data patterns and leverage them to make accurate predictions in many applications, including computer vision, natural language processing, and speech recognition, to name a few examples. These models can sometimes even outperform skilled humans [1], but they still make mistakes. Unfortunately, the severity of these mistakes is compounded by the fact that the predictions computed by neural networks are often overconfident [2], partly due to overfitting [3, 4]. Several training strategies have been developed to mitigate overfitting, including dropout [5], batch normalization [6], weight normalization [7], data augmentation [8], and early stopping [9]; the latter is the focus of this paper. Early stopping consists of continuously evaluating after each batch of stochastic gradient updates (or epoch) the predictive performance of the current model on hold-out independent data. After a large number of gradient updates, only the intermediate model achieving the best performance on the hold-out data is utilized to make predictions. This strategy is often effective at mitigating overfitting and can produce relatively accurate predictions compared to fully trained models, but it does not fully resolve overconfidence because it does not lead to models with finite-sample guarantees.

artificial intelligence, machine learning, standard error, (16 more...)

2301.11556

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Tennessee (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Axiotis, Kyriakos, Sviridenko, Maxim

Gradient Descent Converges Linearly for Logistic Regression on Separable Data

arXiv.org Artificial IntelligenceJun-25-2023

We show that running gradient descent with variable learning rate guarantees loss $f(x) \leq 1.1 \cdot f(x^*) + \epsilon$ for the logistic regression objective, where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x^*$. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.

artificial intelligence, gradient descent converge linearly, machine learning, (12 more...)

2306.14381

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Egger, Maximilian, Hofmeister, Christoph, Wachter-Zeh, Antonia, Bitar, Rawad

Private Aggregation in Wireless Federated Learning with Heterogeneous Clusters

arXiv.org Artificial IntelligenceJun-24-2023

Federated learning collaboratively trains a neural network on privately owned data held by several participating clients. The gradient descent algorithm, a well-known and popular iterative optimization procedure, is run to train the neural network. Every client uses its local data to compute partial gradients and sends it to the federator which aggregates the results. Privacy of the clients' data is a major concern. In fact, observing the partial gradients can be enough to reveal the clients' data. Private aggregation schemes have been investigated to tackle the privacy problem in federated learning where all the users are connected to each other and to the federator. In this paper, we consider a wireless system architecture where clients are only connected to the federator via base stations. We derive fundamental limits on the communication cost when information-theoretic privacy is required, and introduce and analyze a private aggregation scheme tailored for this setting.

artificial intelligence, machine learning, optimization problem, (18 more...)

2306.14088

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Telecommunications (0.77)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Williams, John, Roberts, Stephen

G-TRACER: Expected Sharpness Optimization

arXiv.org Artificial IntelligenceJun-24-2023

We propose a new regularization scheme for the optimization of deep learning architectures, G-TRACER ("Geometric TRACE Ratio"), which promotes generalization by seeking flat minima, and has a sound theoretical basis as an approximation to a natural-gradient descent based optimization of a generalized Bayes objective. By augmenting the loss function with a TRACER, curvature-regularized optimizers (eg SGD-TRACER and Adam-TRACER) are simple to implement as modifications to existing optimizers and don't require extensive tuning. We show that the method converges to a neighborhood (depending on the regularization strength) of a local minimum of the unregularized objective, and demonstrate competitive performance on a number of benchmark computer vision and NLP datasets, with a particular focus on challenging low signal-to-noise ratio problems.

approximation, artificial intelligence, machine learning, (17 more...)

2306.13914

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Sun, Haoyuan, Gatmiry, Khashayar, Ahn, Kwangjun, Azizan, Navid

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Inspired by the remarkable success of deep neural networks, there has been significant interest in understanding the generalization performance of overparameterized models. Substantial efforts have been invested in characterizing how optimization algorithms impact generalization through their "preferred" solutions, a phenomenon commonly referred to as implicit regularization. In particular, it has been argued that gradient descent (GD) induces an implicit $\ell_2$-norm regularization in regression and classification problems. However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization. To address this, we present a unified approach using mirror descent (MD), a notable generalization of GD, to control implicit regularization in both regression and classification settings. More specifically, we show that MD with the general class of homogeneous potential functions converges in direction to a generalized maximum-margin solution for linear classification problems, thereby answering a long-standing question in the classification setting. Further, we show that MD can be implemented efficiently and under suitable conditions, enjoys fast convergence. Through comprehensive experiments, we demonstrate that MD is a versatile method to produce learned models with different regularizers, which in turn have different generalization performances.

artificial intelligence, deep learning, machine learning, (16 more...)

2306.13853

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.82)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

A New Paradigm for Generative Adversarial Networks based on Randomized Decision Rules

Kim, Sehwan, Song, Qifan, Liang, Faming

The Generative Adversarial Network (GAN) was recently introduced in the literature as a novel machine learning method for training generative models. It has many applications in statistics such as nonparametric clustering and nonparametric conditional independence tests. However, training the GAN is notoriously difficult due to the issue of mode collapse, which refers to the lack of diversity among generated data. In this paper, we identify the reasons why the GAN suffers from this issue, and to address it, we propose a new formulation for the GAN based on randomized decision rules. In the new formulation, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium. We propose to train the GAN by an empirical Bayes-like method by treating the discriminator as a hyper-parameter of the posterior distribution of the generator. Specifically, we simulate generators from its posterior distribution conditioned on the discriminator using a stochastic gradient Markov chain Monte Carlo (MCMC) algorithm, and update the discriminator using stochastic gradient descent along with simulations of the generators. We establish convergence of the proposed method to the Nash equilibrium. Apart from image generation, we apply the proposed method to nonparametric clustering and nonparametric conditional independence tests. A portion of the numerical results is presented in the supplementary material.

artificial intelligence, generator, machine learning, (14 more...)

2306.13641

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Stochastic Gradient Descent under Markovian Sampling Schemes

Even, Mathieu

We study a variation of vanilla stochastic gradient descent where the optimizer only has access to a Markovian sampling scheme. These schemes encompass applications that range from decentralized optimization with a random walker (token algorithms), to RL and online system identification problems. We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain and on the functions optimized. We first unveil the theoretical lower bound for methods that sample stochastic gradients along the path of a Markov chain, making appear a dependency in the hitting time of the underlying Markov chain. We then study Markov chain SGD (MC-SGD) under much milder regularity assumptions than prior works (e.g., no bounded gradients or domain, and infinite state spaces). We finally introduce MC-SAG, an alternative to MC-SGD with variance reduction, that only depends on the hitting time of the Markov chain, therefore obtaining a communication-efficient token algorithm.

algorithm, artificial intelligence, machine learning, (12 more...)

2302.14428

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Koloskova, Anastasia, McKenna, Ryan, Charles, Zachary, Rush, Keith, McMahan, Brendan

Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy

We study gradient descent under linearly correlated noise. Our work is motivated by recent practical methods for optimization with differential privacy (DP), such as DP-FTRL, which achieve strong performance in settings where privacy amplification techniques are infeasible (such as in federated learning). These methods inject privacy noise through a matrix factorization mechanism, making the noise linearly correlated over iterations. We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise. We analyze the behavior of gradient descent in this setting, for both convex and non-convex functions. Our analysis is demonstrably tighter than prior work and recovers multiple important special cases exactly (including anticorrelated perturbed gradient descent). We use our results to develop new, effective matrix factorizations for differentially private optimization, and highlight the benefits of these factorizations theoretically and empirically.

algorithm, artificial intelligence, machine learning, (15 more...)

2302.01463

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)