AITopics | Optimization

Leveraging Inter-Layer Dependency for Post-Training Quantization

Neural Information Processing SystemsApr-25-2026, 06:17:01 GMT

Prior works on Post-training Quantization (PTQ) typically separate a neural network into sub-nets and quantize them sequentially. This process pays little attention to the dependency across the sub-nets, hence is less optimal. In this paper, we propose a novel Network-Wise Quantization (NWQ) approach to fully leveraging inter-layer dependency. NWQ faces a larger scale combinatorial optimization problem of discrete variables than in previous works, which raises two major challenges: over-fitting and discrete optimization problem. NWQ alleviates over fitting via a Activation Regularization (AR) technique, which better controls the activation distribution. To optimize discrete variables, NWQ introduces Annealing Softmax (ASoftmax) and Annealing Mixup (AMixup) to progressively transition quantized weights and activations from continuity to discretization, respectively. Extensive experiments demonstrates that NWQ outperforms prior state-of-the-art approaches by a large margin: 20.24% for the challenging configuration of MobileNetV2 with 2 bits on ImageNet, pushing extremely low-bit PTQ from feasibility to usability. In addition, NWQ is able to achieve competitive or better results with only 10% computation cost of previous works.

artificial intelligence, machine learning, quantization, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

2c19666cbb2c14d45d39e2dcf6ab0b99-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 06:16:08 GMT

artificial intelligence, iteration, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Software (0.68)

Add feedback

Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions

Neural Information Processing SystemsApr-25-2026, 06:15:39 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report (0.47)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

2b323d6eb28422cef49b266557dd31ad-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 06:15:36 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

Neural Information Processing SystemsApr-25-2026, 06:15:25 GMT

Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks. However, training big models requires strong distributed system expertise to carefully design model-parallel execution strategies that suit the model architectures and cluster setups. In this paper, we develop AMP, a framework that automatically derives such strategies. AMP identifies a valid space of model parallelism strategies and efficiently searches the space for high-performed strategies, by leveraging a cost model designed to capture the heterogeneity of the model and cluster specifications. Unlike existing methods, AMP is specifically tailored to support complex models composed of uneven layers and cluster setups with more heterogeneous accelerators and bandwidth. We evaluate AMP on popular models and cluster setups from public clouds and show that AMP returns parallel strategies that match the expert-tuned strategies on typical cluster setups. On heterogeneous clusters or models with heterogeneous architectures, AMP finds strategies with 1.54 and 1.77 higher throughput than state-of-the-art model-parallel systems, respectively.

artificial intelligence, machine learning, optimization problem, (20 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Appendix

Neural Information Processing SystemsApr-25-2026, 06:02:03 GMT

The literature for the geometric properties of Riemannian Manifolds is immense and hence we cannot hope to survey them here; for an appetizer, we refer the reader to Burago et al. [93] and Lee [94] and references therein. On the other hand, as stated, it is not until recently that the long-run non-asymptotic behavior of optimization algorithms in Riemannian manifolds (even the smooth ones) has encountered a lot of interest. For concision, we have deferred here a detailed exposition of the rest of recent results to Appendix A of the paper's supplement. Additionally, in Appendix B we also give a bunch of motivating examples which can be solved by Riemannian min-max optimization. Many application problems can be formulated as the minimization or maximization of a smooth function over Riemannian manifold and has triggered a line of research on the extension of the classical first-order and second-order methods to Riemannian setting with asymptotic convergence to first-order stationary points in general [95].

artificial intelligence, exp 1, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback

First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces

Neural Information Processing SystemsApr-25-2026, 06:01:59 GMT

From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative--we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically stronglyconvex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.

artificial intelligence, machine learning, manifold, (15 more...)

Neural Information Processing Systems

Country: Asia (0.15)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.40)

Add feedback

15d6717f8bb33b3a74df26ce1eee0b9a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 05:47:56 GMT

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California (1.00)
Europe (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Semiconductors & Electronics (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.67)

Add feedback

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

Neural Information Processing SystemsApr-25-2026, 05:29:40 GMT

This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient--matching the convergence rate of optimally tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal--automatically adapting to both smooth and nonsmooth problems. While popular algorithms following the AdaGrad framework compute a running average of the squared gradients to use for normalization, DoWG maintains a new distance-based weighted version of the running average, which is crucial to achieve the desired properties. To complement our theory, we also show empirically that DoWG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks.

artificial intelligence, gradient descent, machine learning, (14 more...)

Neural Information Processing Systems

Country: