AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Revisiting the acceleration phenomenon via high-resolution differential equations

arXiv.org Artificial IntelligenceDec-11-2022

Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms. It was not successfully uncovered until the high-resolution differential equation framework was proposed in [Shi et al., 2022] that the mechanism behind the acceleration phenomenon is due to the gradient correction term. To deepen our understanding of the high-resolution differential equation framework on the convergence rate, we continue to investigate NAG for the $\mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation in this paper. First, we revisit the proof from the gradient-correction scheme. Similar to [Chen et al., 2022], the straightforward calculation simplifies the proof extremely and enlarges the step size to $s=1/L$ with minor modification. Meanwhile, the way of constructing Lyapunov functions is principled. Furthermore, we also investigate NAG from the implicit-velocity scheme. Due to the difference in the velocity iterates, we find that the Lyapunov function is constructed from the implicit-velocity scheme without the additional term and the calculation of iterative difference becomes simpler. Together with the optimal step size obtained, the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.

artificial intelligence, lyapunov function, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.057

Country:

Asia > Middle East > Jordan (0.05)
Asia > China > Beijing > Beijing (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Optimizers are variants of Gradient Descent

#artificialintelligenceDec-10-2022, 14:50:30 GMT

Optimizers are at the core of Deep Learning algorithms and in fact, it is like a heart to the human body. Without optimizers, there isn't any Deep Learning algorithm exists. All the optimizers are enhanced versions of the Gradient Descent algorithm. So, understanding How Gradient Descent works!! would help while following through with this article. Let's understand the example between Standard Gradient Descent, Stochastic Gradient Descent, and Mini-batch Stochastic Gradient Descent with an example.

artificial intelligence, gradient descent, machine learning, (14 more...)

#artificialintelligence

Country: Asia > India (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Learning Optimizers in Deep Learning

#artificialintelligenceDec-9-2022, 21:30:39 GMT

There are many different types of optimizers that can be used in deep learning, each with its own strengths and weaknesses. Some common optimizers include stochastic gradient descent (SGD), Adam, RMSprop, and Adagrad. Stochastic gradient descent (SGD) is a simple and widely used optimizer that updates the model parameters based on the gradient of the loss function with respect to the parameters. It is often used as a baseline optimizer and can work well in many cases, but it can be sensitive to the learning rate and may require careful tuning. Adam, which stands for adaptive moment estimation, is an optimizer that combines the advantages of SGD and RMSprop.

gradient, learning optimizer, optimizer, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Evolution of Mirror Descent part1(Machine Learning Optimization)

#artificialintelligenceDec-9-2022, 21:30:31 GMT

Abstract: Mirror descent is a gradient descent method that uses a dual space of parametric models. The great idea has been developed in convex optimization, but not yet widely applied in machine learning. In this study, we provide a possible way that the mirror descent can help data-driven parameter initialization of neural networks. Abstract: The stochastic mirror descent (SMD) algorithm is a general class of training algorithms, which includes the celebrated stochastic gradient descent (SGD), as a special case. It utilizes a mirror potential to influence the implicit bias of the training algorithm.

descent, machine learning optimization, mirror descent part1, (7 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

Add feedback

Online Convex Optimization of Programmable Quantum Computers to Simulate Time-Varying Quantum Channels

Chittoor, Hari Hara Suthan, Simeone, Osvaldo, Banchi, Leonardo, Pirandola, Stefano

arXiv.org Artificial IntelligenceDec-9-2022

Simulating quantum channels is a fundamental primitive in quantum computing, since quantum channels define general (trace-preserving) quantum operations. An arbitrary quantum channel cannot be exactly simulated using a finite-dimensional programmable quantum processor, making it important to develop optimal approximate simulation techniques. In this paper, we study the challenging setting in which the channel to be simulated varies adversarially with time. We propose the use of matrix exponentiated gradient descent (MEGD), an online convex optimization method, and analytically show that it achieves a sublinear regret in time. Through experiments, we validate the main results for time-varying dephasing channels using a programmable generalized teleportation processor.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2212.05145

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

ChromaCorrect: Prescription Correction in Virtual Reality Headsets through Perceptual Guidance

Güzel, Ahmet, Beyazian, Jeanne, Chakravarthula, Praneeth, Akşit, Kaan

arXiv.org Artificial IntelligenceDec-8-2022

A large portion of today's world population suffer from vision impairments and wear prescription eyeglasses. However, eyeglasses causes additional bulk and discomfort when used with augmented and virtual reality headsets, thereby negatively impacting the viewer's visual experience. In this work, we remedy the usage of prescription eyeglasses in Virtual Reality (VR) headsets by shifting the optical complexity completely into software and propose a prescription-aware rendering approach for providing sharper and immersive VR imagery. To this end, we develop a differentiable display and visual perception model encapsulating display-specific parameters, color and visual acuity of human visual system and the user-specific refractive errors. Using this differentiable visual perception model, we optimize the rendered imagery in the display using stochastic gradient-descent solvers. This way, we provide prescription glasses-free sharper images for a person with vision impairments. We evaluate our approach on various displays, including desktops and VR headsets, and show significant quality and contrast improvements for users with vision impairments.

artificial intelligence, human computer interaction, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.04264

Country:

North America > United States > New York > New York County > New York City (0.05)
Asia (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Analysis of Kinetic Models for Label Switching and Stochastic Gradient Descent

Burger, Martin, Rossi, Alex

arXiv.org Artificial IntelligenceDec-8-2022

In this paper we provide a novel approach to the analysis of kinetic models for label switching, which are used for particle systems that can randomly switch between gradient flows in different energy landscapes. Besides problems in biology and physics, we also demonstrate that stochastic gradient descent, the most popular technique in machine learning, can be understood in this setting, when considering a time-continuous variant. Our analysis is focusing on the case of evolution in a collection of external potentials, for which we provide analytical and numerical results about the evolution as well as the stationary problem.

artificial intelligence, equation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.00389

Country: Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

General-Purpose In-Context Learning by Meta-Learning Transformers

Kirsch, Louis, Harrison, James, Sohl-Dickstein, Jascha, Metz, Luke

arXiv.org Artificial IntelligenceDec-8-2022

Modern machine learning requires system designers to specify aspects of the learning pipeline, such as losses, architectures, and optimizers. Meta-learning, or learning-to-learn, instead aims to learn those aspects, and promises to unlock greater capabilities with less manual effort. One particularly ambitious goal of meta-learning is to train general-purpose in-context learning algorithms from scratch, using only black-box models with minimal inductive bias. Such a model takes in training data, and produces test-set predictions across a wide range of problems, without any explicit definition of an inference model, training loss, or optimization algorithm. In this paper we show that Transformers and other blackbox models can be meta-trained to act as general-purpose in-context learners. We characterize transitions between algorithms that generalize, algorithms that memorize, and algorithms that fail to meta-train at all, induced by changes in model size, number of tasks, and meta-optimization. We further show that the capabilities of meta-trained algorithms are bottlenecked by the accessible state size (memory) determining the next prediction, unlike standard models which are thought to be bottlenecked by parameter count. Finally, we propose practical interventions such as biasing the training distribution that improve the meta-training and metageneralization of general-purpose in-context learning algorithms. Meta-learning is the process of automatically discovering new learning algorithms instead of designing them manually (Schmidhuber, 1987). An important quality of human-engineered learning algorithms, such as backpropagation and gradient descent, is their applicability to a wide range of tasks or environments. For learning-to-learn to exceed those capabilities, the meta-learned learning algorithms must be similarily general-purpose. Recently, there has been significant progress toward this goal (Kirsch et al., 2019; Oh et al., 2020). The improved generality of the discovered learning algorithms has been achieved by introducing inductive bias, such as by bottlenecking the architecture or by hiding information, which encourage learning over memorization. Methods include restricting learning rules to use gradients (Metz et al., 2019; Kirsch et al., 2019; Oh et al., 2020), symbolic graphs (Real et al., 2020; Co-Reyes et al., 2021), or parameter sharing (Kirsch & Schmidhuber, 2020; Kirsch et al., 2021). While enabling generalization, these inductive biases come at the cost of increasing the effort to design these systems and potentially restrict the space of discoverable learning algorithms. Instead, we seek to explore general-purpose meta-learning systems with minimal inductive bias.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.04458

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

FPGA Implementation of Multi-Layer Machine Learning Equalizer with On-Chip Training

Liu, Keren, Börjeson, Erik, Häger, Christian, Larsson-Edefors, Per

arXiv.org Artificial IntelligenceDec-7-2022

Moreover, environmental changes due to temperature or mechanical strains can lead to time-varying effects which require adaptive equalization. Adaptive equalizers are indeed commonplace in optical receivers [1, 2], typically implemented via gradient-descent-based least-mean squares filtering [3]. For example, in coherent systems such equalizers can track the inverse Jones matrix of the channel and may also correct for additional distortions such as residual chromatic dispersion [4]. However, the underlying equalizer structure is linear, which limits the type of functionalities that can be expressed and therefore also the performance that can be achieved. To overcome the limitations of linear equalizers, a wide variety of machine learning (ML) algorithms have recently been proposed and verified in hardware (HW). For example, field-programmable gate array (FPGA) implementations of various neural network equalizers were demonstrated for IM/DD links [5], passive optical networks [6], optical interconnects [7], and coherent systems [8].

artificial intelligence, equalizer, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2212.03515

Country:

Europe > Sweden > Vaestra Goetaland > Gothenburg (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces

Lan, Charline Le, Greaves, Joshua, Farebrother, Jesse, Rowland, Mark, Pedregosa, Fabian, Agarwal, Rishabh, Bellemare, Marc G.

arXiv.org Artificial IntelligenceDec-7-2022

Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of a given matrix from sample entries, i.e. from small random submatrices. Although a number of sample-based methods exist for this problem (e.g. Oja's rule \citep{oja1982simplified}), these assume access to full columns of the matrix or particular matrix structure such as symmetry and cannot be combined as-is with neural networks \citep{baldi1989neural}. In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace is represented by a neural network, and hence can be scaled to datasets with an effectively infinite number of rows and columns. Our method consists in defining a loss function whose minimizer is the desired principal subspace, and constructing a gradient estimate of this loss whose bias can be controlled. We complement our theoretical analysis with a series of experiments on synthetic matrices, the MNIST dataset \citep{lecun2010mnist} and the reinforcement learning domain PuddleWorld \citep{sutton1995generalization} demonstrating the usefulness of our approach.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2212.04025

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback