AITopics | Gradient Descent

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. V ariance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in memory. Therefore speed-ups relative to SGD may need a minimal number of epochs in order to materialize. This paper investigates algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points, which offers advantages in the transient optimization phase. As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms. We provide experimental results supporting our theory.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Supplementary Materials for " Privacy of Noisy Stochastic Gradient Descent: More Iterations without More Privacy Loss "

Neural Information Processing SystemsOct-2-2025, 15:56:16 GMT

A central issue in machine learning is how to train models on sensitive user data.

divergence, oisy -sgd, privacy amplification, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

Neural Information Processing SystemsOct-2-2025, 15:51:48 GMT

Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam.

artificial intelligence, fisher, machine learning, (13 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Add feedback

Stochastic Optimization for Performative Prediction

Neural Information Processing SystemsOct-2-2025, 15:38:37 GMT

What sets this setting apart from traditional stochastic optimization is the difference between merely updating model parameters and deploying the new model.

artificial intelligence, greedy deploy, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

33e75ff09dd601bbe69f351039152189-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 15:38:30 GMT

artificial intelligence, machine learning, optimization problem, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.33)

Add feedback

Author Response: Stochastic Optimization for Performative Prediction - Paper # 28

Neural Information Processing SystemsOct-2-2025, 15:38:18 GMT

Experimental comparison to prior work.

artificial intelligence, machine learning, stochastic optimization, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.32)

Add feedback

Stochastic Continuous Greedy ++: When Upper and Lower Bounds Match

Amin Karbasi, Hamed Hassani, Aryan Mokhtari, Zebang Shen

Neural Information Processing SystemsOct-2-2025, 15:38:05 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, complexity, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

Deeply Learning the Messages in Message Passing Inference

Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel

Neural Information Processing SystemsOct-2-2025, 14:38:09 GMT

Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to directly estimate the messages in message passing inference for structured prediction with Conditional Random Fields (CRFs). With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. The network output dimension of message estimators is the same as the number of classes, rather than exponentially growing in the order of the potentials. Hence it is more scalable for cases that involve a large number of classes. We apply our method to semantic image segmentation and achieve impressive performance, which demonstrates the effectiveness and usefulness of our CNN message learning method.

artificial intelligence, inductive learning, machine learning, (19 more...)

Neural Information Processing Systems

Country: Oceania > Australia > South Australia > Adelaide (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback