AITopics | Gradient Descent

It is worth noting that, Eq. In Section 4.1, we have shown the experimental results of HPM on two population synthetic functions, It is worth noting that, since the synthetic function only simulates the validation loss function ( i.e., The same exploit strategy in PBT, i.e., truncation selection [ All the codes on the synthetic functions were implemented with Autograd. Same to the Figure 1 in Section 4.1, we show the mean performance We show the details of hyperparameters we tuned on the benchmark datasets as follows. The tied weight is used for the embedding and softmax layer.

hypergradient, hyperparameter, synthetic function, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Anonymous

Neural Information Processing SystemsAug-16-2025, 11:12:55 GMT

While our presentation focuses on this finite-sum structure, most of our convergence results can easily be adapted to the general stochastic setting (see App. D).

artificial intelligence, assumption, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Russia (0.04)
(2 more...)

Genre:

Research Report (0.46)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

9f96f36b7aae3b1ff847c26ac94c604e-Paper.pdf

Anonymous

Neural Information Processing SystemsAug-16-2025, 11:12:52 GMT

artificial intelligence, convergence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > Middle East > Jordan (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Neural Information Processing SystemsAug-16-2025, 09:39:54 GMT

Work partially conducted while affiliated with the V ector Institute.

artificial intelligence, machine learning, variance, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

A used and training procedures

Neural Information Processing SystemsAug-16-2025, 07:28:47 GMT

All the models are trained for 200 epochs with stochastic gradient descent with a batch size = 128, momentum = 0.9, and cosine All the hyperparameters were selected with a small grid search. From epoch 150 to epoch 185 the training error of the chunks with size 128/256 decreases below 0.5%, while for smaller chunk sizes it remains above 5%. Random chunks with sizes larger than 128/256 can fit the training set, thus having the same representational power as the whole network on the training data. For W > 128/256 the test accuracy is decaying approximately with the same law as that of independent networks with the same width (see Figure 1). This picture suggests that for CIFAR100 the size of a clone is 128/256, slightly larger than the size of the clones in CIFAR10.

artificial intelligence, densenet40-bc, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Neural Information Processing SystemsAug-16-2025, 03:39:58 GMT

Third, the probabilistic bounds we obtain for SGD (i.e., on quantiles) provide novel insights over the previously known in-expectation

convergence rate, gradient descent, noise, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Neural Information Processing SystemsAug-15-2025, 23:07:26 GMT

AI safety has long been an important issue in the deep learning community.

algorithm, convergence, csgld, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > Canada (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Reproducibility in Optimization: Theoretical Framework and Limits

Neural Information Processing SystemsAug-15-2025, 20:52:20 GMT

We initiate a formal study of reproducibility in optimization. We define a quantitative measure of reproducibility of optimization procedures in the face of noisy or error-prone operations such as inexact or stochastic gradient computations or inexact initialization.

artificial intelligence, machine learning, optimization problem, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.38)

Add feedback