AITopics | frank-wolfe method

Collaborating Authors

frank-wolfe method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

da9e6a4a4aeca98588e4dd77ceb37695-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 17:01:22 GMT

dimension, optimal solution, polytope, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > Russia (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Add feedback

On the Benefits of Accelerated Optimization in Robust and Private Estimation

Marchis, Laurentiu Andrei, Loh, Po-Ling

arXiv.org Machine LearningJun-4-2025

We study the advantages of accelerated gradient methods, specifically based on the Frank-Wolfe method and projected gradient descent, for privacy and heavy-tailed robustness. Our approaches are as follows: For the Frank-Wolfe method, our technique is based on a tailored learning rate and a uniform lower bound on the gradient of the $\ell_2$-norm over the constraint set. For accelerating projected gradient descent, we use the popular variant based on Nesterov's momentum, and we optimize our objective over $\mathbb{R}^p$. These accelerations reduce iteration complexity, translating into stronger statistical guarantees for empirical and population risk minimization. Our analysis covers three settings: non-random data, random model-free data, and parametric models (linear regression and generalized linear models). Methodologically, we approach both privacy and robustness based on noisy gradients. We ensure differential privacy via the Gaussian mechanism and advanced composition, and we achieve heavy-tailed robustness using a geometric median-of-means estimator, which also sharpens the dependency on the dimension of the covariates. Finally, we compare our rates to existing bounds and identify scenarios where our methods attain optimal convergence.

artificial intelligence, machine learning, probability, (14 more...)

arXiv.org Machine Learning

2506.03044

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Information Technology > Security & Privacy (0.67)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Approximate FW Algorithm with a novel DMO method over Graph-structured Support Set

Pan, Yijian, Qiang, Hongjiao

arXiv.org Artificial IntelligenceNov-25-2024

In this project, we reviewed a paper that deals graph-structured convex optimization (GSCO) problem with the approximate Frank-Wolfe (FW) algorithm. We analyzed and re-implemented the original algorithm and introduced some extensions based on that. Then we conducted experiments to compare the results and concluded that our backtracking line-search method effectively reduced the number of iterations, while our new DMO method (Top-g+ optimal visiting) did not make satisfying enough improvements.

frank-wolfe method, function value, fw method, (13 more...)

arXiv.org Artificial Intelligence

2411.04389

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.49)
Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization

Xiong, Zikai, Freund, Robert M.

arXiv.org Machine LearningNov-21-2023

The Frank-Wolfe method has become increasingly useful in statistical and machine learning applications, due to the structure-inducing properties of the iterates, and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of Empirical Risk Minimization -- one of the fundamental optimization problems in statistical and machine learning -- the computational effectiveness of Frank-Wolfe methods typically grows linearly in the number of data observations $n$. This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on $n$, we look to second-order smoothness of typical smooth loss functions (least squares loss and logistic loss, for example) and we propose amending the Frank-Wolfe method with Taylor series-approximated gradients, including variants for both deterministic and stochastic settings. Compared with current state-of-the-art methods in the regime where the optimality tolerance $\varepsilon$ is sufficiently small, our methods are able to simultaneously reduce the dependence on large $n$ while obtaining optimal convergence rates of Frank-Wolfe methods, in both the convex and non-convex settings. We also propose a novel adaptive step-size approach for which we have computational guarantees. Last of all, we present computational experiments which show that our methods exhibit very significant speed-ups over existing methods on real-world datasets for both convex and non-convex binary classification problems.

artificial intelligence, frank-wolfe method, machine learning, (17 more...)

arXiv.org Machine Learning

2208.13933

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Reducing Discretization Error in the Frank-Wolfe Method

Chen, Zhaoyue, Sun, Yifan

arXiv.org Artificial IntelligenceApr-13-2023

The Frank-Wolfe algorithm is a popular method in structurally constrained machine learning applications, due to its fast per-iteration complexity. However, one major limitation of the method is a slow rate of convergence that is difficult to accelerate due to erratic, zig-zagging step directions, even asymptotically close to the solution. We view this as an artifact of discretization; that is to say, the Frank-Wolfe \emph{flow}, which is its trajectory at asymptotically small step sizes, does not zig-zag, and reducing discretization error will go hand-in-hand in producing a more stabilized method, with better convergence properties. We propose two improvements: a multistep Frank-Wolfe method that directly applies optimized higher-order discretization schemes; and an LMO-averaging scheme with reduced discretization error, and whose local convergence rate over general convex sets accelerates from a rate of $O(1/k)$ to up to $O(1/k^{3/2})$.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2304.01432

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Add feedback

A Multistep Frank-Wolfe Method

Chen, Zhaoyue, Sun, Yifan

arXiv.org Artificial IntelligenceOct-14-2022

The Frank-Wolfe algorithm has regained much interest in its use in structurally constrained machine learning applications. However, one major limitation of the Frank-Wolfe algorithm is the slow local convergence property due to the zig-zagging behavior. We observe the zig-zagging phenomenon in the Frank-Wolfe method as an artifact of discretization, and propose multistep Frank-Wolfe variants where the truncation errors decay as $O(\Delta^p)$, where $p$ is the method's order. This strategy "stabilizes" the method, and allows tools like line search and momentum to have more benefits. However, our results suggest that the worst case convergence rate of Runge-Kutta-type discretization schemes cannot improve upon that of the vanilla Frank-Wolfe method for a rate depending on $k$. Still, we believe that this analysis adds to the growing knowledge of flow analysis for optimization methods, and is a cautionary tale on the ultimate usefulness of multistep methods.

artificial intelligence, frank-wolfe method, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2210.0811

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Acceleration of the kernel herding algorithm by improved gradient approximation

Tsuji, Kazuma, Tanaka, Ken'ichiro

arXiv.org Machine LearningMay-17-2021

Kernel herding is a method used to construct quadrature formulas in a reproducing kernel Hilbert space. Although there are some advantages of kernel herding, such as numerical stability of quadrature and effective outputs of nodes and weights, the convergence speed of worst-case integration error is slow in comparison to other quadrature methods. To address this problem, we propose two improved versions of the kernel herding algorithm. The fundamental concept of both algorithms involves approximating negative gradients with a positive linear combination of vertex directions. We analyzed the convergence and validity of both algorithms theoretically; in particular, we showed that the approximation of negative gradients directly influences the convergence speed. In addition, we confirmed the accelerated convergence of the worst-case integration error with respect to the number of nodes and computational time through numerical experiments.

algorithm, convergence speed, proposition 3, (15 more...)

arXiv.org Machine Learning

2105.079

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Frank-Wolfe with a Nearest Extreme Point Oracle

Garber, Dan, Wolf, Noam

arXiv.org Machine LearningFeb-3-2021

We consider variants of the classical Frank-Wolfe algorithm for constrained smooth convex minimization, that instead of access to the standard oracle for minimizing a linear function over the feasible set, have access to an oracle that can find an extreme point of the feasible set that is closest in Euclidean distance to a given vector. We first show that for many feasible sets of interest, such an oracle can be implemented with the same complexity as the standard linear optimization oracle. We then show that with such an oracle we can design new Frank-Wolfe variants which enjoy significantly improved complexity bounds in case the set of optimal solutions lies in the convex hull of a subset of extreme points with small diameter (e.g., a low-dimensional face of a polytope). In particular, for many $0\text{--}1$ polytopes, under quadratic growth and strict complementarity conditions, we obtain the first linearly convergent variant with rate that depends only on the dimension of the optimal face and not on the ambient dimension.

linear optimization oracle, oracle, polytope, (15 more...)

arXiv.org Machine Learning

2102.02029

Country:

Asia > Middle East > Israel (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Revisiting Frank-Wolfe for Polytopes: Strict Complementarity and Sparsity

Garber, Dan

arXiv.org Machine LearningOct-17-2020

In recent years it was proved that simple modifications of the classical Frank-Wolfe algorithm (aka conditional gradient algorithm) for smooth convex minimization over convex and compact polytopes, converge with linear rate, assuming the objective function has the quadratic growth property. However, the rate of these methods depends explicitly on the dimension of the problem which cannot explain their empirical success for large scale problems. In this paper we first demonstrate that already for very simple problems and even when the optimal solution lies on a low-dimensional face of the polytope, such dependence on the dimension cannot be avoided in worst case. We then revisit the addition of a strict complementarity assumption already considered in Wolfe's classical book \cite{Wolfe1970}, and prove that under this condition, the Frank-Wolfe method with away-steps and line-search converges linearly with rate that depends explicitly only on the dimension of the optimal face. We motivate strict complementarity by proving that it implies sparsity-robustness of optimal solutions to noise.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

2006.00558

Country:

Europe > Russia (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Russia (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback