Goto

Collaborating Authors

 dc decomposition



Revisiting Frank-Wolfe for Structured Nonconvex Optimization

arXiv.org Artificial Intelligence

We introduce a new projection-free (Frank-Wolfe) method for optimizing structured nonconvex functions that are expressed as a difference of two convex functions. This problem class subsumes smooth nonconvex minimization, positioning our method as a promising alternative to the classical Frank-Wolfe algorithm. DC decompositions are not unique; by carefully selecting a decomposition, we can better exploit the problem structure, improve computational efficiency, and adapt to the underlying problem geometry to find better local solutions. We prove that the proposed method achieves a first-order stationary point in $O(1/\epsilon^2)$ iterations, matching the complexity of the standard Frank-Wolfe algorithm for smooth nonconvex minimization in general. Specific decompositions can, for instance, yield a gradient-efficient variant that requires only $O(1/\epsilon)$ calls to the gradient oracle. Finally, we present numerical experiments demonstrating the effectiveness of the proposed method compared to the standard Frank-Wolfe algorithm.


Understand the Effectiveness of Shortcuts through the Lens of DCA

arXiv.org Artificial Intelligence

Difference-of-Convex Algorithm (DCA) is a well-known nonconvex optimization algorithm for minimizing a nonconvex function that can be expressed as the difference of two convex ones. Many famous existing optimization algorithms, such as SGD and proximal point methods, can be viewed as special DCAs with specific DC decompositions, making it a powerful framework for optimization. On the other hand, shortcuts are a key architectural feature in modern deep neural networks, facilitating both training and optimization. We showed that the shortcut neural network gradient can be obtained by applying DCA to vanilla neural networks, networks without shortcut connections. Therefore, from the perspective of DCA, we can better understand the effectiveness of networks with shortcuts. Moreover, we proposed a new architecture called NegNet that does not fit the previous interpretation but performs on par with ResNet and can be included in the DCA framework.


Unified SVM algorithm based on LS-DC Loss

arXiv.org Machine Learning

Over the past two decades, Support Vector Machine (SVM) has been a popular supervised machine learning model, and plenty of distinct algorithms are designed separately based on different KKT conditions of SVM model for classification/regression with the different losses, including the convex loss or non-convex loss. In this paper, we propose an algorithm that can train different SVM models in a \emph{unified} scheme. Firstly, we introduce a definition of the \emph{LS-DC} (least squares type of difference of convex) loss and show that the most commonly used losses in the SVM community are LS-DC loss or can be approximated by LS-DC loss. Then based on DCA (difference of convex algorithm), we propose a unified algorithm, called \emph{UniSVM} that can solve the SVM model with any convex or non-convex LS-DC loss, in which only a vector is computed especially by the specifically chosen loss. Particularly, for training robust SVM models with non-convex losses, UniSVM has a dominant advantage over all the existing algorithms, because it has a closed-form solution per iteration while the existing ones always need to solve an L1/L2-SVM per iteration. Furthermore, by the low-rank approximation of the kernel matrix, UniSVM can solve the large-scale nonlinear problems with efficiency. To verify the efficacy and feasibility of the proposed algorithm, experiments on large benchmark data sets with/without outliers for classification and regression are investigated. UniSVM can be easily grasped by users or researchers because its core code in Matlab is less than 10 lines.


Difference of Convex Functions Programming Applied to Control with Expert Data

arXiv.org Machine Learning

This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-regularized Classification for Apprenticeship Learning (RCAL) and Reinforcement Learning with Expert Demonstrations (RLED), through experiments on generic Markov Decision Processes (MDP), called Garnets.


DC Decomposition of Nonconvex Polynomials with Algebraic Techniques

arXiv.org Machine Learning

We consider the problem of decomposing a multivariate polynomial as the difference of two convex polynomials. We introduce algebraic techniques which reduce this task to linear, second order cone, and semidefinite programming. This allows us to optimize over subsets of valid difference of convex decompositions (dcds) and find ones that speed up the convex-concave procedure (CCP). We prove, however, that optimizing over the entire set of dcds is NP-hard.