AITopics

2305.06625

Country:

North America > United States > New York (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Wang, Zhanwei, Huang, Kaibin, Eldar, Yonina C.

Spectrum Breathing: Protecting Over-the-Air Federated Learning Against Interference

Federated Learning (FL) is a widely embraced paradigm for distilling artificial intelligence from distributed mobile data. However, the deployment of FL in mobile networks can be compromised by exposure to interference from neighboring cells or jammers. Existing interference mitigation techniques require multi-cell cooperation or at least interference channel state information, which is expensive in practice. On the other hand, power control that treats interference as noise may not be effective due to limited power budgets, and also that this mechanism can trigger countermeasures by interference sources. As a practical approach for protecting FL against interference, we propose Spectrum Breathing, which cascades stochastic-gradient pruning and spread spectrum to suppress interference without bandwidth expansion. The cost is higher learning latency by exploiting the graceful degradation of learning speed due to pruning. We synchronize the two operations such that their levels are controlled by the same parameter, Breathing Depth. To optimally control the parameter, we develop a martingale-based approach to convergence analysis of Over-the-Air FL with spectrum breathing, termed AirBreathing FL. We show a performance tradeoff between gradient-pruning and interference-induced error as regulated by the breathing depth. Given receive SIR and model size, the optimization of the tradeoff yields two schemes for controlling the breathing depth that can be either fixed or adaptive to channels and the learning process. As shown by experiments, in scenarios where traditional Over-the-Air FL fails to converge in the presence of strong interference, AirBreahing FL with either fixed or adaptive breathing depth can ensure convergence where the adaptive scheme achieves close-to-ideal performance.

artificial intelligence, interference, machine learning, (16 more...)

2305.05933

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > Quebec > Montreal (0.04)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Information Technology (0.46)
Telecommunications (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Chou, Hung-Hsu, Maly, Johannes, Rauhut, Holger

More is Less: Inducing Sparsity via Overparameterization

In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressed sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, if there exists an exact solution, vanilla gradient flow for the overparameterized loss functional converges to a good approximation of the solution of minimal $\ell_1$-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressed sensing via gradient flow/descent on overparameterized models derived in previous works. The theory accurately predicts the recovery rate in numerical experiments. Our proof relies on analyzing a certain Bregman divergence of the flow. This bypasses the obstacles caused by non-convexity and should be of independent interest.

artificial intelligence, gradient flow, machine learning, (16 more...)

2112.11027

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Russia (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Zhang, Peiyuan, Teng, Jiaye, Zhang, Jingzhao

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

This work studies the generalization error of gradient methods. More specifically, we focus on how training steps $T$ and step-size $\eta$ might affect generalization in smooth stochastic convex optimization (SCO) problems. We first provide tight excess risk lower bounds for Gradient Descent (GD) and Stochastic Gradient Descent (SGD) under the general non-realizable smooth SCO setting, suggesting that existing stability analyses are tight in step-size and iteration dependence, and that overfitting provably happens. Next, we study the case when the loss is realizable, i.e. an optimal solution minimizes all the data points. Recent works show better rates can be attained but the improvement is reduced when training time is long. Our paper examines this observation by providing excess risk lower bounds for GD and SGD in two realizable settings: 1) $\eta T = \bigO{n}$, and (2) $\eta T = \bigOmega{n}$, where $n$ is the size of dataset. In the first case $\eta T = \bigOmega{n}$, our lower bounds tightly match and certify the respective upper bounds. However, for the case $\eta T = \bigOmega{n}$, our analysis indicates a gap between the lower and upper bounds. A conjecture is proposed that the gap can be closed by improving upper bounds, supported by analyses in two special scenarios.

artificial intelligence, machine learning, sgd, (12 more...)

2303.10758

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Mair, Jamie F., Rose, Dominic C., Garrahan, Juan P.

Training neural network ensembles via trajectory sampling

In machine learning, there is renewed interest in neural network ensembles (NNEs), whereby predictions are obtained as an aggregate from a diverse set of smaller models, rather than from a single larger model. Here, we show how to define and train a NNE using techniques from the study of rare trajectories in stochastic systems. We define an NNE in terms of the trajectory of the model parameters under a simple, and discrete in time, diffusive dynamics, and train the NNE by biasing these trajectories towards a small time-integrated loss, as controlled by appropriate counting fields which act as hyperparameters. We demonstrate the viability of this technique on a range of simple supervised learning tasks. We discuss potential advantages of our trajectory sampling approach compared with more conventional gradient based methods.

artificial intelligence, machine learning, trajectory, (16 more...)

2209.11116

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Securing Distributed SGD against Gradient Leakage Threats

Wei, Wenqi, Liu, Ling, Zhou, Jingya, Chow, Ka-Ho, Wu, Yanzhao

This paper presents a holistic approach to gradient leakage resilient distributed Stochastic Gradient Descent (SGD). First, we analyze two types of strategies for privacy-enhanced federated learning: (i) gradient pruning with random selection or low-rank filtering and (ii) gradient perturbation with additive random noise or differential privacy noise. We analyze the inherent limitations of these approaches and their underlying impact on privacy guarantee, model accuracy, and attack resilience. Next, we present a gradient leakage resilient approach to securing distributed SGD in federated learning, with differential privacy controlled noise as the tool. Unlike conventional methods with the per-client federated noise injection and fixed noise parameter strategy, our approach keeps track of the trend of per-example gradient updates. It makes adaptive noise injection closely aligned throughout the federated model training. Finally, we provide an empirical privacy analysis on the privacy guarantee, model utility, and attack resilience of the proposed approach. Extensive evaluation using five benchmark datasets demonstrates that our gradient leakage resilient approach can outperform the state-of-the-art methods with competitive accuracy performance, strong differential privacy guarantee, and high resilience against gradient leakage attacks. The code associated with this paper can be found: https://github.com/git-disl/Fed-alphaCDP.

artificial intelligence, machine learning, privacy, (16 more...)

2305.06473

Country:

North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Efficiently Escaping Saddle Points in Bilevel Optimization

Huang, Minhui, Chen, Xuxing, Ji, Kaiyi, Ma, Shiqian, Lai, Lifeng

Bilevel optimization is one of the fundamental problems in machine learning and optimization. Recent theoretical developments in bilevel optimization focus on finding the first-order stationary points for nonconvex-strongly-convex cases. In this paper, we analyze algorithms that can escape saddle points in nonconvex-strongly-convex bilevel optimization. Specifically, we show that the perturbed approximate implicit differentiation (AID) with a warm start strategy finds $\epsilon$-approximate local minimum of bilevel optimization in $\tilde{O}(\epsilon^{-2})$ iterations with high probability. Moreover, we propose an inexact NEgative-curvature-Originated-from-Noise Algorithm (iNEON), a pure first-order algorithm that can escape saddle point and find local minimum of stochastic bilevel optimization. As a by-product, we provide the first nonasymptotic analysis of perturbed multi-step gradient descent ascent (GDmax) algorithm that converges to local minimax point for minimax problems.

artificial intelligence, machine learning, optimization, (16 more...)

2202.03684

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Milzarek, Andre, Qiu, Junwen

Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality

arXiv.org Artificial IntelligenceMay-9-2023

In this paper, we present a novel stochastic normal map-based algorithm ($\mathsf{norM}\text{-}\mathsf{SGD}$) for nonconvex composite-type optimization problems and discuss its convergence properties. Using a time window-based strategy, we first analyze the global convergence behavior of $\mathsf{norM}\text{-}\mathsf{SGD}$ and it is shown that every accumulation point of the generated sequence of iterates $\{\boldsymbol{x}^k\}_k$ corresponds to a stationary point almost surely and in an expectation sense. The obtained results hold under standard assumptions and extend the more limited convergence guarantees of the basic proximal stochastic gradient method. In addition, based on the well-known Kurdyka-{\L}ojasiewicz (KL) analysis framework, we provide novel point-wise convergence results for the iterates $\{\boldsymbol{x}^k\}_k$ and derive convergence rates that depend on the underlying KL exponent $\boldsymbol{\theta}$ and the step size dynamics $\{\alpha_k\}_k$. Specifically, for the popular step size scheme $\alpha_k=\mathcal{O}(1/k^\gamma)$, $\gamma \in (\frac23,1]$, (almost sure) rates of the form $\|\boldsymbol{x}^k-\boldsymbol{x}^*\| = \mathcal{O}(1/k^p)$, $p \in (0,\frac12)$, can be established. The obtained rates are faster than related and existing convergence rates for $\mathsf{SGD}$ and improve on the non-asymptotic complexity bounds for $\mathsf{norM}\text{-}\mathsf{SGD}$.

artificial intelligence, machine learning, optimization problem, (14 more...)

2305.05828

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > China > Guangdong Province > Shenzhen (0.04)
(7 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

arXiv.org Artificial IntelligenceMay-9-2023

Measuring Forgetting of Memorized Training Examples

Jagielski, Matthew, Thakkar, Om, Tramèr, Florian, Ippolito, Daphne, Lee, Katherine, Carlini, Nicholas, Wallace, Eric, Song, Shuang, Thakurta, Abhradeep, Papernot, Nicolas, Zhang, Chiyuan

Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what extent models "forget" the specifics of training examples, becoming less susceptible to privacy attacks on examples they have not seen recently. We show that, while non-convex models can memorize data forever in the worst-case, standard image, speech, and language models empirically do forget examples over time. We identify nondeterminism as a potential explanation, showing that deterministically trained models do not forget. Our results suggest that examples seen early when training with extremely large datasets - for instance those examples used to pre-train a model - may observe privacy benefits at the expense of examples seen later.

artificial intelligence, inductive learning, machine learning, (17 more...)

2207.00099

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Artificial IntelligenceMay-6-2023

Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

Xiao, Nachuan, Hu, Xiaoyin, Liu, Xin, Toh, Kim-Chuan

The optimization problem in the form of UNP has numerous important applications in machine learning and data science, especially in training deep neural networks. In these applications of UNP, we usually only have access to the stochastic evaluations of the exact gradients of f. The stochastic gradient descent (SGD) is one of the most popular methods for solving UNP, and incorporating the momentum terms to SGD for acceleration is also very popular in practice. In SGD, the updating rule depends on the stepsizes (i.e., learning rates), where all of the coordinates of the variable x are equipped with the same stepsize. Recently, a variety of accelerated versions for SGD are proposed. In particular, the widely used Adam algorithm (Kingma and Ba, 2015) is developed based on the adaptive adjustment of the coordinate-wise stepsizes and the incorporation of momentum terms in each iteration. These enhancements have led to its high efficiency in practice. Motivated by Adam, a number of efficient Adam-family methods are developed, such as AdaBelief (Zhuang et al., 2020), AMSGrad (Reddi et al., 2018), NAdam (Dozat), Yogi (Zaheer et al., 2018), etc. Towards the convergence properties of these Adam-family methods, Kingma and Ba (2015) shows the convergence properties for Adam with constant stepsize and global Lipschitz objective function f.

artificial intelligence, machine learning, sequence, (15 more...)

2305.03938

Country:

Asia > Singapore (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)