AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Online greedy identification of linear dynamical systems

arXiv.org Artificial IntelligenceApr-13-2022

This work addresses the problem of exploration in an unknown environment. For linear dynamical systems, we use an experimental design framework and introduce an online greedy policy where the control maximizes the information of the next step. In a setting with a limited number of experimental trials, our algorithm has low complexity and shows experimentally competitive performances compared to more elaborate gradient-based methods.

artificial intelligence, identification, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CDC51059.2022.9993030

2204.06375

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Optimizer in Deep Learning

#artificialintelligenceApr-5-2022, 06:45:15 GMT

An optimizer is a function or an algorithm that customizes the attributes of the neural network, such as weights and discovering rate. Hence, it assists in decreasing the overall loss and also enhance the accuracy. The problem of picking the ideal weights for the version is an overwhelming job, as a deep learning version usually includes numerous parameters. It increases the requirement to pick an appropriate optimization algorithm for your application. You can utilize different optimizers to make changes in your weights as well as learning price.

algorithm, gradient descent, optimizer, (5 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Stochastic Gradient Descent Using Pytorch Linear Module

#artificialintelligenceApr-3-2022, 09:15:09 GMT

In the previous tutorial here on SGD, I explored the way in which we can implement using PyTorch's built-in gradient calculation, loss, and optimization implementation. in our present discussion…

pytorch linear module, stochastic gradient descent

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

What is momentum in a Neural network and how does it work?

#artificialintelligenceApr-2-2022, 19:23:42 GMT

In a neural network, there is the concept of loss, which is used to calculate performance. The higher the loss, the poorer the performance of the neural network, that is why we always try to minimize the loss so that the neural network performs better. The process of minimizing loss is called optimization. An optimizer is a method that modifies the weights of the neural network to reduce the loss. Although several neural network optimizers exist, in this article we will learn about gradient descent with momentum and compare its performance with others.

batch gradient descent, descent, gradient descent, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)

Add feedback

Scalable Whitebox Attacks on Tree-based Models

Castiglione, Giuseppe, Ding, Gavin, Hashemi, Masoud, Srinivasa, Christopher, Wu, Ga

arXiv.org Machine LearningMar-31-2022

Adversarial robustness is one of the essential safety criteria for guaranteeing the reliability of machine learning models. While various adversarial robustness testing approaches were introduced in the last decade, we note that most of them are incompatible with non-differentiable models such as tree ensembles. Since tree ensembles are widely used in industry, this reveals a crucial gap between adversarial robustness research and practical applications. This paper proposes a novel whitebox adversarial robustness testing approach for tree ensemble models. Concretely, the proposed approach smooths the tree ensembles through temperature controlled sigmoid functions, which enables gradient descent-based adversarial attacks. By leveraging sampling and the log-derivative trick, the proposed approach can scale up to testing tasks that were previously unmanageable. We compare the approach against both random perturbations and blackbox approaches on multiple public datasets (and corresponding models). Our results show that the proposed method can 1) successfully reveal the adversarial vulnerability of tree ensemble models without causing computational pressure for testing and 2) flexibly balance the search performance and time complexity to meet various testing criteria.

adversarial example, artificial intelligence, machine learning, (10 more...)

arXiv.org Machine Learning

2204.00103

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine (0.48)
Information Technology > Security & Privacy (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Add feedback

Local optimisation of Nystr\"om samples through stochastic gradient descent

Hutchings, Matthew, Gauthier, Bertrand

arXiv.org Machine LearningMar-24-2022

We study a relaxed version of the column-sampling problem for the Nystr\"om approximation of kernel matrices, where approximations are defined from multisets of landmark points in the ambient space; such multisets are referred to as Nystr\"om samples. We consider an unweighted variation of the radial squared-kernel discrepancy (SKD) criterion as a surrogate for the classical criteria used to assess the Nystr\"om approximation accuracy; in this setting, we discuss how Nystr\"om samples can be efficiently optimised through stochastic gradient descent. We perform numerical experiments which demonstrate that the local minimisation of the radial SKD yields Nystr\"om samples with improved Nystr\"om approximation accuracy.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2203.13284

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum

Banman, Kirby, Peet-Pare, Liam, Hegde, Nidhi, Fyshe, Alona, White, Martha

arXiv.org Machine LearningMar-22-2022

Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on iid sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated input samples such as continual learning and reinforcement learning. Existing work has shown that SGDm with a decaying step-size can converge under Markovian temporal correlation. In this work, we show that SGDm under covariate shift with a fixed step-size can be unstable and diverge. In particular, we show SGDm under covariate shift is a parametric oscillator, and so can suffer from a phenomenon known as resonance. We approximate the learning system as a time varying system of ordinary differential equations, and leverage existing theory to characterize the system's divergence/convergence as resonant/nonresonant modes. The theoretical result is limited to the linear setting with periodic covariate shift, so we empirically supplement this result to show that resonance phenomena persist even under non-periodic covariate shift, nonlinear dynamics with neural networks, and optimizers other than SGDm.

conference paper, covariate shift, frequency content, (15 more...)

arXiv.org Machine Learning

2203.11992

Country:

Asia > Middle East > Jordan (0.05)
Europe > Russia (0.04)
Asia > Russia (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

Wang, Dongsheng, Guo, Dandan, Zhao, He, Zheng, Huangjie, Tanwisuth, Korawat, Chen, Bo, Zhou, Mingyuan

arXiv.org Machine LearningMar-14-2022

A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document and hence often suffers from poor performance in analyzing short documents. In addition, its parameter estimation often relies on approximate posterior inference that is either not scalable or suffers from large approximation error. This paper introduces a new topic-modeling framework where each document is viewed as a set of word embedding vectors and each topic is modeled as an embedding vector in the same embedding space. Embedding the words and topics in the same vector space, we define a method to measure the semantic difference between the embedding vectors of the words of a document and these of the topics, and optimize the topic embeddings to minimize the expected difference over all documents. Experiments on text analysis demonstrate that the proposed method, which is amenable to mini-batch stochastic gradient descent based optimization and hence scalable to big corpora, provides competitive performance in discovering more coherent and diverse topics and extracting better document representations.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2203.0157

Country:

North America > United States (0.67)
North America > Canada (0.46)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports (1.00)
Energy > Oil & Gas (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Linear Model the Machine Learning Way

#artificialintelligenceMar-12-2022, 23:51:46 GMT

The Ordinary Least Squares model (OLS) is a central building block in Machine Learning (ML). OLS is also used everywhere in Social Sciences. I come from an Economics background and I was initially a bit puzzled by the way the ML textbooks solve OLS. In this blog post, I explain the Economics way versus the ML way and why both make sense. TL;DR: In a high-dimensional setting, do not inverse a huge matrix, use gradient descent.

descent, gradient, gradient descent, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Zou, Difan, Wu, Jingfeng, Braverman, Vladimir, Gu, Quanquan, Kakade, Sham M.

arXiv.org Machine LearningMar-7-2022

Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which may be pessimistic to explain the superior generalization ability for some particular problem instance. The goal of this paper is to sharply characterize the generalization of multi-pass SGD, by developing an instance-dependent excess risk bound for least squares in the interpolation regime, which is expressed as a function of the iteration number, stepsize, and data covariance. We show that the excess risk of SGD can be exactly decomposed into the excess risk of GD and a positive fluctuation error, suggesting that SGD always performs worse, instance-wisely, than GD, in generalization. On the other hand, we show that although SGD needs more iterations than GD to achieve the same level of excess risk, it saves the number of stochastic gradient evaluations, and therefore is preferable in terms of computational time.

excess risk, multi-pass sgd, sgd, (16 more...)

arXiv.org Machine Learning

2203.03159

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Add feedback