AITopics | Gradient Descent

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-9-2025, 14:43:11 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper proposes to use random functions as an approximation to kernel functions and then proposes to do stochastic gradient descent. Convergence rates and generalisation bounds are derived. Experimental results on large datasets are presented. The idea of introducing random functions to approximate kernel functions and then using SGD is very interesting.

algorithm, random feature, variance, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.57)

Add feedback

Scalable Kernel Methods via Doubly Stochastic Gradients

Neural Information Processing SystemsOct-9-2025, 14:43:09 GMT

The general perception is that kernel methods are not scalable, so neural nets become the choice for large-scale nonlinear learning problems. Have we tried hard enough for kernel methods? In this paper, we propose an approach that scales up kernel methods using a novel concept called " doubly stochastic functional gradients ". Based on the fact that many kernel methods can be expressed as convex optimization problems, our approach solves the optimization problems by making two unbiased stochastic approximations to the functional gradient--one using random training points and another using random features associated with the kernel--and performing descent steps with this noisy functional gradient. Our algorithm is simple, need no commit to a preset number of random features, and allows the flexibility of the function class to grow as we see more incoming data in the streaming setting. We demonstrate that a function learned by this procedure after t iterations converges to the optimal function in the reproducing kernel Hilbert space in rate O (1/t), and achieves a generalization bound of O (1 / t). Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show competitive performances of our approach as compared to neural nets in datasets such as 2.3 million energy materials from MolecularSpace, 8 million handwritten digits from MNIST, and 1 million photos from ImageNet using convolution features.

algorithm, functional gradient, kernel method, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.35)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.52)

Add feedback

62db9e3397c76207a687c360e0243317-Supplemental.pdf

Neural Information Processing SystemsOct-9-2025, 14:35:39 GMT

artificial intelligence, constraint, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Add feedback

Learning to Learn By Self-Critique

Antreas Antoniou, Amos J. Storkey

Neural Information Processing SystemsOct-9-2025, 14:17:29 GMT

In few-shot learning, a machine learning system learns from a small set of labelled examples relating to a specific task, such that it can generalize to new examples of the same task. Given the limited availability of labelled examples in such tasks, we wish to make use of all the information we can. Usually a model learns task-specific information from a small training-set ( support-set) to predict on an unlabelled validation set ( target-set). The target-set contains additional task-specific information which is not utilized by existing few-shot learning methods. Making use of the target-set examples via transductive learning requires approaches beyond the current methods; at inference time, the target-set contains only unlabelled input data-points, and so discriminative learning cannot be used. In this paper, we propose a framework called Self-Critique and Adapt or SCA, which learns to learn an label-free loss function, parameterized as a neural network. A base-model learns on a support-set using existing methods (e.g.

artificial intelligence, machine learning, prediction, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Basri Ronen, David Jacobs, Yoni Kasten, Shira Kritchman

Neural Information Processing SystemsOct-9-2025, 14:10:19 GMT

We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions.

artificial intelligence, frequency, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Probabilistic low-rank matrix completion on finite alphabets

Jean Lafond, Olga Klopp, Eric Moulines, Joseph Salmon

Neural Information Processing SystemsOct-9-2025, 14:08:41 GMT

The task of reconstructing a matrix given a sample of observed entries is known as the matrix completion problem . It arises in a wide range of problems, including recommender systems, collaborative filtering, dimensionality reduction, image processing, quantum physics or multi-class classification to name a few. Most works have focused on recovering an unknown real-valued low-rank matrix from randomly sub-sampling its entries. Here, we investigate the case where the observations take a finite number of values, corresponding for examples to ratings in recommender systems or labels in multi-class classification. We also consider a general sampling scheme (not necessarily uniform) over the matrix entries. The performance of a nuclear-norm penalized estimator is analyzed theoretically. More precisely, we derive bounds for the Kullback-Leibler divergence between the true and estimated distributions. In practice, we have also proposed an efficient algorithm based on lifted coordinate gradient descent in order to tackle potentially high dimensional settings.

completion, matrix, matrix completion, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen

Neural Information Processing SystemsOct-9-2025, 11:13:07 GMT

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings.

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(7 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory

Neural Information Processing SystemsOct-9-2025, 09:54:50 GMT

Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe the Edge of Stability (EoS) phenomenon.

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Reading (0.04)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Tight Risk Bounds for Gradient Descent on Separable Data

Neural Information Processing SystemsOct-9-2025, 08:54:34 GMT

Recently, there has been a marked increase in interest regarding the generalization capabilities of unregularized gradient-based learning methods.

artificial intelligence, gradient descent, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Appendix: On the Overlooked Structure of Stochastic Gradients

Neural Information Processing SystemsOct-9-2025, 08:04:37 GMT

Avila is a non-image dataset. A.3 Image classification on MNIST We perform the common per-pixel zero-mean unit-variance normalization as data preprocessing for MNIST. Pretraining Hyperparameter Settings: We train neural networks for 50 epochs on MNIST for obtaining pretrained models. The batch size is set to 1 and no weight decay is used, unless we specify them otherwise. As for other optimizer hyperparameters, we apply the default settings directly.

artificial intelligence, es 1, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

Filters

Collaborating Authors

Gradient Descent

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Scalable Kernel Methods via Doubly Stochastic Gradients

62db9e3397c76207a687c360e0243317-Supplemental.pdf

Learning to Learn By Self-Critique

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Probabilistic low-rank matrix completion on finite alphabets

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory

Tight Risk Bounds for Gradient Descent on Separable Data

Appendix: On the Overlooked Structure of Stochastic Gradients