AITopics | Sebastian U. Stich

Safe Adaptive Importance Sampling

Sebastian U. Stich, Anant Raj, Martin Jaggi

Neural Information Processing SystemsMay-27-2025, 23:08:21 GMT

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications. Improved adaptive variants--using importance values defined by the complete gradient information which changes during optimization--enjoy favorable theoretical properties, but are typically computationally infeasible. In this paper we propose an efficient approximation of gradient-based sampling, which is based on safe bounds on the gradient. The proposed sampling distribution is (i) provably the best sampling with respect to the given bounds, (ii) always better than uniform sampling and fixed importance sampling and (iii) can efficiently be computed--in many applications at negligible extra cost. The proposed sampling scheme is generic and can easily be integrated into existing algorithms. In particular, we show that coordinate-descent (CD) and stochastic gradient descent (SGD) can enjoy significant a speed-up under the novel scheme. The proven efficiency of the proposed sampling is verified by extensive numerical testing.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization

Robert Gower, Filip Hanzely, Peter Richtarik, Sebastian U. Stich

Neural Information Processing SystemsMay-26-2025, 10:13:13 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.28)
Asia > Middle East > Saudi Arabia (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Sparsified SGD with Memory

Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi

Neural Information Processing SystemsMay-26-2025, 09:18:24 GMT

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders perfect scalability. Various recent works proposed to use quantization or sparsification techniques to reduce the amount of data that needs to be communicated, for instance by only sending the most significant entries of the stochastic gradient (top-k sparsification). Whilst such schemes showed very promising performance in practice, they have eluded theoretical analysis so far. In this work we analyze Stochastic Gradient Descent (SGD) with k-sparsification or compression (for instance top-k or random-k) and show that this scheme converges at the same rate as vanilla SGD when equipped with error compensation (keeping track of accumulated errors in memory). That is, communication can be reduced by a factor of the dimension of the problem (sometimes even more) whilst still converging at the same rate. We present numerical experiments to illustrate the theoretical findings and the good scalability for distributed applications.

artificial intelligence, machine learning, proceedings, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Louisiana (0.14)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.78)

Add feedback

Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization

Robert Gower, Filip Hanzely, Peter Richtarik, Sebastian U. Stich

Neural Information Processing SystemsMar-27-2025, 01:18:13 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East (0.28)
Europe > Switzerland (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Sparsified SGD with Memory

Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi

Neural Information Processing SystemsOct-3-2024, 16:30:24 GMT

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders perfect scalability. Various recent works proposed to use quantization or sparsification techniques to reduce the amount of data that needs to be communicated, for instance by only sending the most significant entries of the stochastic gradient (top-k sparsification). Whilst such schemes showed very promising performance in practice, they have eluded theoretical analysis so far. In this work we analyze Stochastic Gradient Descent (SGD) with k-sparsification or compression (for instance top-k or random-k) and show that this scheme converges at the same rate as vanilla SGD when equipped with error compensation (keeping track of accumulated errors in memory). That is, communication can be reduced by a factor of the dimension of the problem (sometimes even more) whilst still converging at the same rate. We present numerical experiments to illustrate the theoretical findings and the good scalability for distributed applications.

artificial intelligence, machine learning, proceedings, (17 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Industry: Education (0.34)

Technology: