AITopics

2301.00327

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Ohio (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

arXiv.org Artificial IntelligenceJan-28-2023

CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated Learning

Zhang, Pengyu, Zhou, Yingbo, Hu, Ming, Fu, Xin, Wei, Xian, Chen, Mingsong

Since random initial models in Federated Learning (FL) can easily result in unregulated Stochastic Gradient Descent (SGD) processes, existing FL methods greatly suffer from both slow convergence and poor accuracy, especially for non-IID scenarios. To address this problem, we propose a novel FL method named CyclicFL, which can quickly derive effective initial models to guide the SGD processes, thus improving the overall FL training performance. Based on the concept of Continual Learning (CL), we prove that CyclicFL approximates existing centralized pre-training methods in terms of classification and prediction performance. Meanwhile, we formally analyze the significance of data consistency between the pre-training and training stages of CyclicFL, showing the limited Lipschitzness of loss for the pre-trained models by CyclicFL. Unlike traditional centralized pre-training methods that require public proxy data, CyclicFL pre-trains initial models on selected clients cyclically without exposing their local data. Therefore, they can be easily integrated into any security-critical FL methods. Comprehensive experimental results show that CyclicFL can not only improve the classification accuracy by up to 16.21%, but also significantly accelerate the overall FL training processes.

artificial intelligence, loss landscape, machine learning, (16 more...)

2301.12193

Country:

North America > United States > Virginia (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Ahmad, Nasir, Schrader, Ellen, van Gerven, Marcel

Constrained Parameter Inference as a Principle for Learning

arXiv.org Artificial IntelligenceJan-27-2023

Learning in neural networks is often framed as a problem in which targeted error signals are directly propagated to parameters and used to produce updates that induce more optimal network behaviour. Backpropagation of error (BP) is an example of such an approach and has proven to be a highly successful application of stochastic gradient descent to deep neural networks. We propose constrained parameter inference (COPI) as a new principle for learning. The COPI approach assumes that learning can be set up in a manner where parameters infer their own values based upon observations of their local neuron activities. We find that this estimation of network parameters is possible under the constraints of decorrelated neural inputs and top-down perturbations of neural states for credit assignment. We show that the decorrelation required for COPI allows learning at extremely high learning rates, competitive with that of adaptive optimizers, as used by BP. We further demonstrate that COPI affords a new approach to feature analysis and network compression. Finally, we argue that COPI may shed new light on learning in biological networks given the evidence for decorrelation in the brain.

artificial intelligence, decorrelation, machine learning, (17 more...)

2203.13203

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Cai, Xufeng, Song, Chaobing, Wright, Stephen J., Diakonikolas, Jelena

Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization

arXiv.org Artificial IntelligenceJan-27-2023

Nonconvex optimization is central in solving many machine learning problems, in which block-wise structure is commonly encountered. In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees. Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent progress on cyclic block coordinate methods. In deterministic settings, our convergence guarantee matches the guarantee of (full-gradient) gradient descent, but with the gradient Lipschitz constant being defined w.r.t.~a Mahalanobis norm. In stochastic settings, we use recursive variance reduction to decrease the per-iteration cost and match the arithmetic operation complexity of current optimal stochastic full-gradient methods, with a unified analysis for both finite-sum and infinite-sum cases. We prove a faster linear convergence result when a Polyak-{\L}ojasiewicz (P{\L}) condition holds. To our knowledge, this work is the first to provide non-asymptotic convergence guarantees -- variance-reduced or not -- for a cyclic block coordinate method in general composite (smooth + nonsmooth) nonconvex settings. Our experimental results demonstrate the efficacy of the proposed cyclic scheme in training deep neural nets.

algorithm, artificial intelligence, machine learning, (15 more...)

2212.05088

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Fan, Flint Xiaofeng, Ma, Yining, Dai, Zhongxiang, Tan, Cheston, Low, Bryan Kian Hsiang, Wattenhofer, Roger

FedHQL: Federated Heterogeneous Q-Learning

Federated Reinforcement Learning (FedRL) encourages distributed agents to learn collectively from each other's experience to improve their performance without exchanging their raw trajectories. The existing work on FedRL assumes that all participating agents are homogeneous, which requires all agents to share the same policy parameterization (e.g., network architectures and training configurations). However, in real-world applications, agents are often in disagreement about the architecture and the parameters, possibly also because of disparate computational budgets. Because homogeneity is not given in practice, we introduce the problem setting of Federated Reinforcement Learning with Heterogeneous And bLack-box agEnts (FedRL-HALE). We present the unique challenges this new setting poses and propose the Federated Heterogeneous Q-Learning (FedHQL) algorithm that principally addresses these challenges. We empirically demonstrate the efficacy of FedHQL in boosting the sample efficiency of heterogeneous agents with distinct policy parameterization using standard RL tasks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2301.11135

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing

Jin, Jikai, Li, Zhiyuan, Lyu, Kaifeng, Du, Simon S., Lee, Jason D.

It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem, whose goal is to recover a low-rank ground-truth matrix from near-isotropic linear measurements. It is shown that GD with small initialization behaves similarly to the greedy low-rank learning heuristics (Li et al., 2020) and follows an incremental learning procedure (Gissin et al., 2019): GD sequentially learns solutions with increasing ranks until it recovers the ground truth matrix. Compared to existing works which only analyze the first learning phase for rank-1 solutions, our result provides characterizations for the whole learning process. Moreover, besides the over-parameterized regime that many prior works focused on, our analysis of the incremental learning procedure also applies to the under-parameterized regime. Finally, we conduct numerical experiments to confirm our theoretical findings.

artificial intelligence, assumption 3, machine learning, (13 more...)

2301.115

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Dhingra, Atul, Shen, Jie, Kleene, Nicholas

Learning Large Scale Sparse Models

In this work, we consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. Two immediate issues occur under such challenging scenario: (i) computational cost; (ii) memory overhead. In particular, the memory issue precludes a large volume of prior algorithms that are based on batch optimization technique. To remedy the problem, we propose to learn sparse models such as Lasso in an online manner where in each iteration, only one randomly chosen sample is revealed to update a sparse iterate. Thereby, the memory cost is independent of the sample size and gradient evaluation for one sample is efficient. Perhaps amazingly, we find that with the same parameter, sparsity promoted by batch methods is not preserved in online fashion. We analyze such interesting phenomenon and illustrate some effective variants including mini-batch methods and a hard thresholding based stochastic gradient algorithm. Extensive experiments are carried out on a public dataset which supports our findings and algorithms.

algorithm, artificial intelligence, machine learning, (16 more...)

2301.10958

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

McMahan, H. Brendan, Moore, Eider, Ramage, Daniel, Hampson, Seth, Arcas, Blaise Agüera y

Communication-Efficient Learning of Deep Networks from Decentralized Data

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.

artificial intelligence, deep learning, machine learning, (17 more...)

1602.05629

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Virginia (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

arXiv.org Artificial IntelligenceJan-25-2023

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Cont, Rama, Rossier, Alain, Xu, RenYuan

We prove linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We show that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite $p-$variation with $p=2$. Proofs are based on non-asymptotic estimates for the loss function and for norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

artificial intelligence, gradient descent, machine learning, (15 more...)

2204.07261

Country:

North America > United States > California (0.14)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)

Wadi, Davood, Fredette, Marc, Senecal, Sylvain

Read the Signs: Towards Invariance to Gradient Descent's Hyperparameter Initialization

arXiv.org Artificial IntelligenceJan-24-2023

We propose ActiveLR, an optimization meta algorithm that localizes the learning rate, $\alpha$, and adapts them at each epoch according to whether the gradient at each epoch changes sign or not. This sign-conscious algorithm is aware of whether from the previous step to the current one the update of each parameter has been too large or too small and adjusts the $\alpha$ accordingly. We implement the Active version (ours) of widely used and recently published gradient descent optimizers, namely SGD with momentum, AdamW, RAdam, and AdaBelief. Our experiments on ImageNet, CIFAR-10, WikiText-103, WikiText-2, and PASCAL VOC using different model architectures, such as ResNet and Transformers, show an increase in generalizability and training set fit, and decrease in training time for the Active variants of the tested optimizers. The results also show robustness of the Active variant of these optimizers to different values of the initial learning rate. Furthermore, the detrimental effects of using large mini-batch sizes are mitigated. ActiveLR, thus, alleviates the need for hyper-parameter search for two of the most commonly tuned hyper-parameters that require heavy time and computational costs to pick. We encourage AI researchers and practitioners to use the Active variant of their optimizer of choice for faster training, better generalizability, and reducing carbon footprint of training deep neural networks.

artificial intelligence, learning rate, machine learning, (18 more...)

2301.10133

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)