AITopics | convex minimization

We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods (such as SVRG or SAGA) to provide the first large-scale linearly convergent algorithms for this class of problems which are common in machine learning. While the algorithmic extension is straightforward, it comes with challenges and opportunities: (a) the convex minimization analysis does not apply and we use the notion of monotone operators to prove convergence, showing in particular that the same algorithm applies to a larger class of problems, such as variational inequalities, (b) there are two notions of splits, in terms of functions, or in terms of partial derivatives, (c) the split does need to be done with convex-concave terms, (d) non-uniform sampling is key to an efficient algorithm, both in theory and practice, and (e) these incremental algorithms can be easily accelerated using a simple extension of the "catalyst" framework, leading to an algorithm which is always superior to accelerated batch algorithms.

artificial intelligence, machine learning, survey article, (17 more...)

Neural Information Processing Systems

Genre: Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

647c722bf90a49140184672e0d3723e3-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 01:22:58 GMT

algorithm, first-order method, setup, (16 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Russia (0.04)
Europe > France (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

647c722bf90a49140184672e0d3723e3-Paper.pdf

Neural Information Processing SystemsAug-14-2025, 20:57:31 GMT

algorithm, first-order method, setup, (16 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Russia (0.04)
Europe > France (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

MixMin: Finding Data Mixtures via Convex Minimization

Thudi, Anvith, Rovers, Evianne, Ruan, Yangjun, Thrush, Tristan, Maddison, Chris J.

arXiv.org Machine LearningFeb-14-2025

Modern machine learning pipelines are increasingly combining and mixing data from diverse and disparate sources, e.g., pre-training large language models. Yet, finding the optimal data mixture is a challenging and open problem. We formalize this data mixing problem as a bi-level objective: the best mixture is the one that would lead to the best model for a downstream objective. Unfortunately, this objective is generally intractable. In this paper, we make the observation that the bi-level data mixing objective becomes convex as our model class becomes larger. We develop and study a gradient-based approach for optimizing this convex objective, which we call MixMin, and test it on language modeling and chemistry tasks. MixMin was the only method that uniformly improved the data mixture in all our experiments. With MixMin, we improved the data mixture using less than 0.2% additional compute for a pythia-410M model trained on 8.2B tokens, resulting between 1-5% relative improvement to negative log likelihood on PIQA, ARC Easy, SciQ, and OpenWebMath. Crucially, we found that MixMin mixtures for smaller models improved training of larger models, suggesting that MixMin mixtures may be scale-invariant. When mixing bioassay data to train an XGBoost model, we saw improvements to average precision scores of 0.03-0.15.

artificial intelligence, convex minimization, machine learning, (2 more...)

arXiv.org Machine Learning

2502.1051

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)

Add feedback

Constrained convex minimization via model-based excessive gap

Quoc Tran-Dinh, Volkan Cevher

Neural Information Processing SystemsFeb-10-2025, 01:21:40 GMT

We introduce a model-based excessive gap technique to analyze first-order primaldual methods for constrained convex minimization. As a result, we construct firstorder primal-dual methods with optimal convergence rates on the primal objective residual and the primal feasibility gap of their iterates separately. Through a dual smoothing and prox-center selection strategy, our framework subsumes the augmented Lagrangian, alternating direction, and dual fast-gradient methods as special cases, where our rates apply.

artificial intelligence, machine learning, optimization problem, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Massachusetts (0.04)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Constrained convex minimization via model-based excessive gap

Neural Information Processing SystemsMar-13-2024, 14:25:21 GMT

We introduce a model-based excessive gap technique to analyze first-order primaldual methods for constrained convex minimization. As a result, we construct firstorder primal-dual methods with optimal convergence rates on the primal objective residual and the primal feasibility gap of their iterates separately. Through a dual smoothing and prox-center selection strategy, our framework subsumes the augmented Lagrangian, alternating direction, and dual fast-gradient methods as special cases, where our rates apply.

algorithm 1, convex minimization, iteration, (12 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Massachusetts (0.04)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Stochastic Variance Reduction Methods for Saddle-Point Problems

Neural Information Processing SystemsMar-12-2024, 08:17:54 GMT

We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods (such as SVRG or SAGA) to provide the first large-scale linearly convergent algorithms for this class of problems which are common in machine learning. While the algorithmic extension is straightforward, it comes with challenges and opportunities: (a) the convex minimization analysis does not apply and we use the notion of monotone operators to prove convergence, showing in particular that the same algorithm applies to a larger class of problems, such as variational inequalities, (b) there are two notions of splits, in terms of functions, or in terms of partial derivatives, (c) the split does need to be done with convex-concave terms, (d) non-uniform sampling is key to an efficient algorithm, both in theory and practice, and (e) these incremental algorithms can be easily accelerated using a simple extension of the "catalyst" framework, leading to an algorithm which is always superior to accelerated batch algorithms.

algorithm, operator, saddle-point problem, (13 more...)

Neural Information Processing Systems

Country: Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Conditional gradient methods for stochastically constrained convex minimization

Vladarean, Maria-Luiza, Alacaoglu, Ahmet, Hsieh, Ya-Ping, Cevher, Volkan

arXiv.org Machine LearningJul-7-2020

We propose two novel conditional gradient-based methods for solving structured stochastic convex optimization problems with a large number of linear constraints. Instances of this template naturally arise from SDP-relaxations of combinatorial problems, which involve a number of constraints that is polynomial in the problem dimension. The most important feature of our framework is that only a subset of the constraints is processed at each iteration, thus gaining a computational advantage over prior works that require full passes. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees. Preliminary numerical experiments are provided for illustrating the practical performance of the methods.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

2007.03795

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Conjugate Gradients and Accelerated Methods Unified: The Approximate Duality Gap View

Diakonikolas, Jelena, Orecchia, Lorenzo

arXiv.org Machine LearningJul-16-2019

This note provides a novel, simple analysis of the method of conjugate gradients for the minimization of convex quadratic functions. In contrast with standard arguments, our proof is entirely self-contained and does not rely on the existence of Chebyshev polynomials. Another advantage of our development is that it clarifies the relation between the method of conjugate gradients and general accelerated methods for smooth minimization by unifying their analyses within the framework of the Approximate Duality Gap Technique that was introduced by the authors.

artificial intelligence, conjugate gradient, sequence, (13 more...)

arXiv.org Machine Learning

1907.00289

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Adaptive Proximal Average Approximation for Composite Convex Minimization

Shen, Li (South China University of Technology) | Liu, Wei (Tencent AI Lab) | Huang, Junzhou (University of Texas at Arlington) | Jiang, Yu-Gang (Fudan University) | Ma, Shiqian (The Chinese University of Hong Kong)

AAAI ConferencesFeb-14-2017

We propose a fast first-order method to solve multi-term nonsmooth composite convex minimization problems by employing a recent proximal average approximation technique and a novel adaptive parameter tuning technique. Thanks to this powerful parameter tuning technique, the proximal gradient step can be performed with a much larger stepsize in the algorithm implementation compared with the prior PA-APG method, which is the core to enable significant improvements in practical performance. Moreover, by choosing the approximation parameter adaptively, the proposed method is shown to enjoy the O(1/k) iteration complexity theoretically without needing any extra computational cost, while the PA-APG method incurs much more iterations for convergence. The preliminary experimental results on overlapping group Lasso and graph-guided fused Lasso problems confirm our theoretic claim well, and indicate that the proposed method is almost five times faster than the state-of-the-art PA-APG method and therefore suitable for higher-precision required optimization.

algorithm, artificial intelligence, machine learning, (14 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: