AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Bridging Differential Privacy and Byzantine-Robustness via Model Aggregation

arXiv.org Artificial IntelligenceAug-2-2022

This paper aims at jointly addressing two seemly conflicting issues in federated learning: differential privacy (DP) and Byzantine-robustness, which are particularly challenging when the distributed data are non-i.i.d. (independent and identically distributed). The standard DP mechanisms add noise to the transmitted messages, and entangles with robust stochastic gradient aggregation to defend against Byzantine attacks. In this paper, we decouple the two issues via robust stochastic model aggregation, in the sense that our proposed DP mechanisms and the defense against Byzantine attacks have separated influence on the learning performance. Leveraging robust stochastic model aggregation, at each iteration, each worker calculates the difference between the local model and the global one, followed by sending the element-wise signs to the master node, which enables robustness to Byzantine attacks. Further, we design two DP mechanisms to perturb the uploaded signs for the purpose of privacy preservation, and prove that they are $(\epsilon,0)$-DP by exploiting the properties of noise distributions. With the tools of Moreau envelop and proximal point projection, we establish the convergence of the proposed algorithm when the cost function is nonconvex. We analyze the trade-off between privacy preservation and learning performance, and show that the influence of our proposed DP mechanisms is decoupled with that of robust stochastic model aggregation. Numerical experiments demonstrate the effectiveness of the proposed algorithm.

dp mechanism, dp-rsa, mechanism, (15 more...)

arXiv.org Artificial Intelligence

2205.00107

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > China (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Gradient-descent quantum process tomography by learning Kraus operators

Ahmed, Shahnawaz, Quijandría, Fernando, Kockum, Anton Frisk

arXiv.org Artificial IntelligenceAug-1-2022

We perform quantum process tomography (QPT) for both discrete- and continuous-variable quantum systems by learning a process representation using Kraus operators. The Kraus form ensures that the reconstructed process is completely positive. To make the process trace-preserving, we use a constrained gradient-descent (GD) approach on the so-called Stiefel manifold during optimization to obtain the Kraus operators. Our ansatz uses a few Kraus operators to avoid direct estimation of large process matrices, e.g., the Choi matrix, for low-rank quantum processes. The GD-QPT matches the performance of both compressed-sensing (CS) and projected least-squares (PLS) QPT in benchmarks with two-qubit random processes, but shines by combining the best features of these two methods. Similar to CS (but unlike PLS), GD-QPT can reconstruct a process from just a small number of random measurements, and similar to PLS (but unlike CS) it also works for larger system sizes, up to at least five qubits. We envisage that the data-driven approach of GD-QPT can become a practical tool that greatly reduces the cost and computational effort for QPT in intermediate-scale quantum systems.

artificial intelligence, arxiv, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1103/PhysRevLett.130.150402

2208.00812

Country:

Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Formal guarantees for heuristic optimization algorithms used in machine learning

Li, Xiaoyu

arXiv.org Artificial IntelligenceJul-31-2022

Recently, Stochastic Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization of machine learning (ML) problems. A variety of strategies have been proposed for tuning the step sizes, ranging from adaptive step sizes to heuristic methods to change the step size in each iteration. Also, momentum has been widely employed in ML tasks to accelerate the training process. Yet, there is a gap in our theoretical understanding of them. In this work, we start to close this gap by providing formal guarantees to a few heuristic optimization methods and proposing improved algorithms. First, we analyze a generalized version of the AdaGrad (Delayed AdaGrad) step sizes in both convex and non-convex settings, showing that these step sizes allow the algorithms to automatically adapt to the level of noise of the stochastic gradients. We show for the first time sufficient conditions for Delayed AdaGrad to achieve almost sure convergence of the gradients to zero. Moreover, we present a high probability analysis for Delayed AdaGrad and its momentum variant in the non-convex setting. Second, we analyze SGD with exponential and cosine step sizes, which are empirically successful but lack theoretical support. We provide the very first convergence guarantees for them in the smooth and non-convex setting, with and without the Polyak-{\L}ojasiewicz (PL) condition. We also show their good property of adaptivity to noise under the PL condition. Third, we study the last iterate of momentum methods. We prove the first lower bound in the convex setting for the last iterate of SGD with constant momentum. Moreover, we investigate a class of Follow-The-Regularized-Leader-based momentum algorithms with increasing momentum and shrinking updates. We show that their last iterate has optimal convergence for unconstrained convex stochastic optimization problems.

algorithm, step size, stepsize, (16 more...)

arXiv.org Artificial Intelligence

2208.00502

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Breaking it Down: Gradient Descent

#artificialintelligenceJul-29-2022, 17:12:52 GMT

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Gradient descent is an optimization algorithm that is used to improve the performance of deep/machine learning models.

descent, gradient, gradient descent, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Differentially Private SGDA for Minimax Problems

Yang, Zhenhuan, Hu, Shu, Lei, Yunwen, Varshney, Kush R., Lyu, Siwei, Ying, Yiming

arXiv.org Artificial IntelligenceJul-29-2022

Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.

artificial intelligence, dp-sgda, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2201.09046

Country:

North America > United States > Virginia (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.92)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

Min, Youngjae, Ahn, Kwangjun, Azizan, Navid

arXiv.org Artificial IntelligenceJul-27-2022

While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the increasing use of overparameterized models, we develop Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. By doing so, we bridge two seemingly distinct algorithms in adaptive filtering and machine learning, namely the recursive least-squares (RLS) algorithm and orthogonal gradient descent (OGD). Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA). Further, we show that, for overparameterized linear models, the parameter vector obtained by our algorithm is what stochastic gradient descent (SGD) would converge to in the standard multi-pass setting. Finally, we generalize the results to the nonlinear setting for highly overparameterized models, relevant for deep learning. Our experiments show the effectiveness of the proposed method compared to the baselines.

datapoint, learning, orfit, (15 more...)

arXiv.org Artificial Intelligence

2207.13853

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BPFISH: Blockchain and Privacy-preserving FL Inspired Smart Healthcare

Singh, Moirangthem Biken, Pratap, Ajay

arXiv.org Artificial IntelligenceJul-27-2022

Abstract--This paper proposes Federated Learning (FL) based smart healthcare system where Medical Centers (MCs) train the local model using the data collected from patients and send the model weights to the miners in a blockchain-based robust framework without sharing raw data, keeping privacy preservation into deliberation. We formulate an optimization problem by maximizing the utility and minimizing the loss function considering energy consumption and FL process delay of MCs for learning effective models on distributed healthcare data underlying a blockchain-based framework. We propose a solution in two stages-first, offer a stable matching-based association algorithm to maximize the utility of both miners and MCs and then solve loss minimization using Stochastic Gradient Descent (SGD) algorithm employing FL under Differential Privacy (DP) and blockchain technology. Moreover, we incorporate blockchain technology to provide tempered resistant and decentralized model weight sharing in the proposed FL-based framework. The effectiveness of the proposed model is shown through simulation on real-world healthcare data comparing other state-of-the-art techniques.

iteration, mcs, miner, (15 more...)

arXiv.org Artificial Intelligence

2207.11654

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Missouri (0.04)
Asia > India > Manipur (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Providers & Services (0.87)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
(2 more...)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Improving Simulated Annealing for Clique Partitioning Problems

Gao, Jian, Lv, Yiqi, Liu, Minghao, Cai, Shaowei, Ma, Feifei

Journal of Artificial Intelligence ResearchJul-27-2022

The Clique Partitioning Problem (CPP) is essential in graph theory with a number of important applications. Due to its NP-hardness, efficient algorithms for solving this problem are very crucial for practical purposes, and simulated annealing is proved to be effective in state-of-the-art CPP algorithms. However, to make simulated annealing more efficient to solve large-scale CPPs, in this paper, we propose a new iterated simulated annealing algorithm. Several methods are proposed in our algorithm to improve simulated annealing. First, a new configuration checking strategy based on timestamp is presented and incorporated into simulated annealing to avoid search cycles. Afterwards, to enhance the local search ability of simulated annealing and speed up convergence, we combine our simulated annealing with a descent search method to solve the CPP. This method further improves solutions found by simulated annealing, and thus compensates for the local search effect. To further accelerate the convergence speed, we introduce a shrinking factor to decline initial temperature and then propose an iterated local search algorithm based on simulated annealing. Additionally, a restart strategy is adopted when the search procedure converges. Extensive experiments on benchmark instances of the CPP were carried out, and the results suggest that the proposed simulated annealing algorithm outperforms all the existing heuristic algorithms, including five state-of-the-art algorithms. Thus the best-known solutions for 34 instances out of 94 are updated. We also conduct comparative analyses of the proposed strategies and show their effectiveness.

algorithm, configuration checking strategy, vertex, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.13382

AI Access Foundation

13382

Journal of Artificial Intelligence Research

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > China > Liaoning Province > Dalian (0.04)
Asia > China > Beijing > Beijing (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Gradient Descent and its Types - Analytics Vidhya

#artificialintelligenceJul-25-2022, 20:05:43 GMT

This article was published as a part of the Data Science Blogathon. In this article, we will explore different types of gradient descent. So let's get started with the article. The algorithm designer can set the learning rate. If we use a learning rate that is too small, it will cause us to update very slowly, requiring more iterations to get a better solution.

analytic vidhya, descent, gradient descent, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

On the benefits of non-linear weight updates

Norridge, Paul

arXiv.org Artificial IntelligenceJul-25-2022

Recent work has suggested that the generalisation performance of a DNN is related to the extent to which the Signal-to-Noise Ratio is optimised at each of the nodes. In contrast, Gradient Descent methods do not always lead to SNR-optimal weight configurations. One way to improve SNR performance is to suppress large weight updates and amplify small weight updates. Such balancing is already implicit in some common optimizers, but we propose an approach that makes this explicit. The method applies a non-linear function to gradients prior to making DNN parameter updates. We investigate the performance with such non-linear approaches. The result is an adaptation to existing optimizers that improves performance for many problem types.

artificial intelligence, arxiv, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.12505

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback