AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Jaehoon Lee, Lechao Xiao, Samuel Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, Jeffrey Pennington

Neural Information Processing SystemsFeb-11-2026, 10:57:02 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, ininternational conferenceon learning representation, machine learning, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Appendix of " Complex-valued Neurons Can Learn More but Slower than Real-valued Neurons via Gradient Descent " A Preliminaries

Neural Information Processing SystemsFeb-11-2026, 10:47:16 GMT

In this section, we first summarize frequently used notations in the following table. Table 4: Frequently used notations.Notation Description C Lemma 7. Let d = 1 . Combining the cases above completes the proof. Subsection B.2 proves several convergence rate lemmas. Subsection B.3 gives some technical We are now ready to prove Theorem 1. Proof of Theorem 1.

artificial intelligence, inequality hold, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

How degenerate is the parametrization of neural networks with the ReLU activation function?

Dennis Maximilian Elbrächter, Julius Berner, Philipp Grohs

Neural Information Processing SystemsFeb-11-2026, 08:36:32 GMT

Neural Information Processing Systems http://nips.cc/

neural network, optimization problem, parametrization, (13 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.15)
North America > United States (0.14)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

A Guide Through the Zoo of Biased SGD

Neural Information Processing SystemsFeb-11-2026, 08:17:57 GMT

We also provide examples where biased estimators outperform their unbiased counterparts or where unbiased versions are simply not available. Finally, we demonstrate the effectiveness of our framework through experimental results that validate our theoretical findings.

artificial intelligence, assumption, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia > North Caucasian Federal District > Republic of Karelia > Petrozavodsk (0.04)
(3 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

LocalSignalAdaptivity: ProvableFeatureLearning inNeuralNetworksBeyondKernels

Neural Information Processing SystemsFeb-11-2026, 06:56:18 GMT

Specifically,we prove that, forasimple data distribution with sparsesignal amidst high-variance noise, a simple convolutional neural network trained using stochastic gradient descent simultaneously learnstothreshold outthenoiseandfindthesignal.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

46c10f6c8ea5aa6f267bcdabcb123f97-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 06:17:47 GMT

factorization, gradient descent, initialization, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Africa > Senegal > Kolda Region > Kolda (0.05)
North America > United States > California > Alameda County > Dublin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.73)

Add feedback

f9d3a954de63277730a1c66d8b38dee3-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 04:56:48 GMT

arxiv preprint arxiv, neural network, tensor, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

TTOpt: AMaximumVolumeQuantizedTensor Train-basedOptimizationanditsApplicationto ReinforcementLearning

Neural Information Processing SystemsFeb-11-2026, 04:54:08 GMT

The vital part of every learning-based algorithm is an optimization procedure, e.g., Stochastic Gradient Descent.

evolutionary algorithm, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States (0.05)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Acontrastiveruleformeta-learning

Neural Information Processing SystemsFeb-11-2026, 04:29:57 GMT

Our rule may be understood as ageneralization of contrastive Hebbian learning to meta-learning and notably, it neither requires computing second derivativesnorgoing backwardsintime,twocharacteristic features of previous gradient-based methods that are hard to conceive in physicalneuralcircuits.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Proofs of the Main Results

Neural Information Processing SystemsFeb-11-2026, 04:23:42 GMT

This section describes Stein variational gradient descent (SVGD) by Liu and Wang [19]. The overview is meant as supplementary material for Section 5, where we propose to use SVGD for inferring the DiBS posteriors p(Z | D) and p(Z, Θ | D). In contrast to sampling-based MCMC or optimizationbased variational inference methods, SVGD iteratively transports a fixed set of particles to closely match a target distribution, akin to the gradient descent algorithm in optimization. We refer the reader to Liu and Wang [19] for additional details. Let p(x) with x X be a differentiable density that we want to sample from, e.g., to estimate an expectation.

artificial intelligence, bayesian inference, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback