AITopics | Martin Jaggi

We study lossy gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well, or fail to achieve the target test accuracy. We propose a new low-rank gradient compressor based on power iteration that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD. The proposed algorithm is the only method evaluated that achieves consistent wall-clock speedups when benchmarked against regular SGD using highly optimized off-the-shelf tools for distributed communication. We demonstrate reduced training times for convolutional networks as well as LSTMs on common datasets. Our code is available at https://github.com/epfml/powersgd.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > Switzerland (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

Celestine Dünner, Thomas Parnell, Martin Jaggi

Neural Information Processing SystemsMay-28-2025, 05:08:56 GMT

Neural Information Processing Systems http://nips.cc/

algorithm 1, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

Francesco Locatello, Michael Tschannen, Gunnar Raetsch, Martin Jaggi

Neural Information Processing SystemsMay-28-2025, 01:43:10 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Safe Adaptive Importance Sampling

Sebastian U. Stich, Anant Raj, Martin Jaggi

Neural Information Processing SystemsMay-27-2025, 23:08:21 GMT

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications. Improved adaptive variants--using importance values defined by the complete gradient information which changes during optimization--enjoy favorable theoretical properties, but are typically computationally infeasible. In this paper we propose an efficient approximation of gradient-based sampling, which is based on safe bounds on the gradient. The proposed sampling distribution is (i) provably the best sampling with respect to the given bounds, (ii) always better than uniform sampling and fixed importance sampling and (iii) can efficiently be computed--in many applications at negligible extra cost. The proposed sampling scheme is generic and can easily be integrated into existing algorithms. In particular, we show that coordinate-descent (CD) and stochastic gradient descent (SGD) can enjoy significant a speed-up under the novel scheme. The proven efficiency of the proposed sampling is verified by extensive numerical testing.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Sparsified SGD with Memory

Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi

Neural Information Processing SystemsMay-26-2025, 09:18:24 GMT

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders perfect scalability. Various recent works proposed to use quantization or sparsification techniques to reduce the amount of data that needs to be communicated, for instance by only sending the most significant entries of the stochastic gradient (top-k sparsification). Whilst such schemes showed very promising performance in practice, they have eluded theoretical analysis so far. In this work we analyze Stochastic Gradient Descent (SGD) with k-sparsification or compression (for instance top-k or random-k) and show that this scheme converges at the same rate as vanilla SGD when equipped with error compensation (keeping track of accumulated errors in memory). That is, communication can be reduced by a factor of the dimension of the problem (sometimes even more) whilst still converging at the same rate. We present numerical experiments to illustrate the theoretical findings and the good scalability for distributed applications.

artificial intelligence, machine learning, proceedings, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Louisiana (0.14)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.78)

Add feedback

Training DNNs with Hybrid Block Floating Point

Mario Drumond, Tao LIN, Martin Jaggi, Babak Falsafi

Neural Information Processing SystemsMay-26-2025, 07:11:37 GMT

The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them. These accelerators typically employ densely packed full-precision floating-point arithmetic to maximize performance per area. Ongoing research efforts seek to further increase that performance density by replacing floating-point with fixedpoint arithmetic. However, a significant roadblock for these attempts has been fixed point's narrow dynamic range, which is insufficient for DNN training convergence. We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic. Unfortunately, BFP alone introduces several limitations that preclude its direct applicability. In this work, we introduce HBFP, a hybrid BFP-FP approach, which performs all dot products in BFP and other operations in floating point. HBFP delivers the best of both worlds: the high accuracy of floating point at the superior hardware density of fixed point. For a wide variety of models, we show that HBFP matches floating point's accuracy while enabling hardware implementations that deliver up to 8.5 higher throughput.

artificial intelligence, machine learning, representation, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

COLA: Decentralized Linear Learning

Lie He, An Bian, Martin Jaggi

Neural Information Processing SystemsMay-26-2025, 04:01:51 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
North America > Canada (0.14)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi

Neural Information Processing SystemsMar-27-2025, 00:44:11 GMT

We study lossy gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well, or fail to achieve the target test accuracy. We propose a new low-rank gradient compressor based on power iteration that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD. The proposed algorithm is the only method evaluated that achieves consistent wall-clock speedups when benchmarked against regular SGD using highly optimized off-the-shelf tools for distributed communication. We demonstrate reduced training times for convolutional networks as well as LSTMs on common datasets. Our code is available at https://github.com/epfml/powersgd.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe > Switzerland (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training DNNs with Hybrid Block Floating Point

Mario Drumond, Tao LIN, Martin Jaggi, Babak Falsafi

Neural Information Processing SystemsMar-26-2025, 06:10:15 GMT

The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them. These accelerators typically employ densely packed full-precision floating-point arithmetic to maximize performance per area. Ongoing research efforts seek to further increase that performance density by replacing floating-point with fixedpoint arithmetic. However, a significant roadblock for these attempts has been fixed point's narrow dynamic range, which is insufficient for DNN training convergence. We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic. Unfortunately, BFP alone introduces several limitations that preclude its direct applicability. In this work, we introduce HBFP, a hybrid BFP-FP approach, which performs all dot products in BFP and other operations in floating point. HBFP delivers the best of both worlds: the high accuracy of floating point at the superior hardware density of fixed point. For a wide variety of models, we show that HBFP matches floating point's accuracy while enabling hardware implementations that deliver up to 8.5 higher throughput.

artificial intelligence, machine learning, representation, (20 more...)

Neural Information Processing Systems

Country: North America > Canada (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unsupervised Scalable Representation Learning for Multivariate Time Series

Jean-Yves Franceschi, Aymeric Dieuleveut, Martin Jaggi

Neural Information Processing SystemsMar-23-2025, 14:08:09 GMT

Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by proposing an unsupervised method to learn universal embeddings of time series. Unlike previous works, it is scalable with respect to their length and we demonstrate the quality, transferability and practicability of the learned representations with thorough experiments and comparisons. To this end, we combine an encoder based on causal dilated convolutions with a novel triplet loss employing time-based negative sampling, obtaining general-purpose representations for variable length and multivariate time series.

artificial intelligence, machine learning, representation, (17 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe > France (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Martin Jaggi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

Safe Adaptive Importance Sampling

Sparsified SGD with Memory

Training DNNs with Hybrid Block Floating Point

COLA: Decentralized Linear Learning

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

Training DNNs with Hybrid Block Floating Point

Unsupervised Scalable Representation Learning for Multivariate Time Series