Primal-Dual Block Frank-Wolfe

Lei, Qi, Zhuo, Jiacheng, Caramanis, Constantine, Dhillon, Inderjit S., Dimakis, Alexandros G.

arXiv.org Machine Learning 

Particularly, we are interested in problems whose solution has special "simple" structure like low-rank or sparsity. The sparsity constraint applies to large-scale multiclass/multi-label classification, low-degree polynomial data mapping [5], random feature kernel machines [32], and Elastic Net [39]. Motivated by recent applications in low-rank multi-class SVM, phase retrieval, matrix completion, affine rank minimization and other problems (e.g., [9, 31, 2, 3]), we also consider settings where the constraint x C (e.g., trace norm ball) while convex, may be difficult to project onto. A wish-list for this class of problems would include an algorithm that (1) exploits the function finite-sum form and the simple structure of the solution, (2) achieves linear convergence for smooth and strongly convex problems, (3) does not pay a heavy price for the projection step. We propose a Frank-Wolfe (FW) type method that attains these three goals. This does not come without challenges: Although it is currently well-appreciated that FW type algorithms avoid the cost of projection [14, 1], the benefits are limited to constraints that are hard to project onto, like the trace norm ball. For problems like phase retrieval and ERM for multi-label multi-class classification, the gradient computation requires large matrix multiplications. This dominates the per-iteration cost, and the existing FW type methods do not asymptotically reduce time complexity per iteration, even without paying the expensive projection step.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found