AITopics | conditional gradient algorithm

Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.

algorithm, basis vector, conditional gradient algorithm, (14 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Semi-Proximal Mirror-Prox for Nonsmooth Composite Minimization

Neural Information Processing SystemsMar-13-2024, 02:45:40 GMT

We propose a new first-order optimization algorithm to solve high-dimensional non-smooth composite minimization problems. Typical examples of such problems have an objective that decomposes into a non-smooth empirical risk part and a non-smooth regularization penalty. The proposed algorithm, called Semi-Proximal Mirror-Prox, leverages the saddle point representation of one part of the objective while handling the other part of the objective via linear minimization over the domain. The algorithm stands in contrast with more classical proximal gradient algorithms with smoothing, which require the computation of proximal operators at each iteration and can therefore be impractical for high-dimensional problems. We establish the theoretical convergence rate of Semi-Proximal Mirror-Prox, which exhibits the optimal complexity bounds, i.e.

algorithm, linear minimization oracle, semi-proximal mirror-prox, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Add feedback

Conditional Gradients for the Approximate Vanishing Ideal

Wirth, Elias, Pokutta, Sebastian

arXiv.org Artificial IntelligenceNov-29-2023

The vanishing ideal of a set of points $X\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the pairwise conditional gradients approximate vanishing ideal algorithm (PCGAVI) that constructs a set of generators of the approximate vanishing ideal. The constructed generators capture polynomial structures in data and give rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In PCGAVI, we construct the set of generators by solving constrained convex optimization problems with the pairwise conditional gradients algorithm. Thus, PCGAVI not only constructs few but also sparse generators, making the corresponding feature transformation robust and compact. Furthermore, we derive several learning guarantees for PCGAVI that make the algorithm theoretically better motivated than related generator-constructing methods.

algorithm, generator, polynomial, (17 more...)

arXiv.org Artificial Intelligence

2202.03349

Country:

Europe > Germany > Berlin (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data

Roy, Abhishek, Balasubramanian, Krishnakumar, Ghadimi, Saeed

arXiv.org Artificial IntelligenceNov-8-2022

We study stochastic optimization algorithms for constrained nonconvex stochastic optimization problems with Markovian data. In particular, we focus on the case when the transition kernel of the Markov chain is state-dependent. Such stochastic optimization problems arise in various machine learning problems including strategic classification and reinforcement learning. For this problem, we study both projection-based and projection-free algorithms. In both cases, we establish that the number of calls to the stochastic first-order oracle to obtain an appropriately defined $\epsilon$-stationary point is of the order $\mathcal{O}(1/\epsilon^{2.5})$. In the projection-free setting we additionally establish that the number of calls to the linear minimization oracle is of order $\mathcal{O}(1/\epsilon^{5.5})$. We also empirically demonstrate the performance of our algorithm on the problem of strategic classification with neural networks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2206.11346

Country:

North America > United States > California > Yolo County > Davis (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Second-order Conditional Gradients

Carderera, Alejandro, Pokutta, Sebastian

arXiv.org Machine LearningFeb-20-2020

An immensely powerful approach when X R n is to construct a second-order approximation to f(x) at the current iterate using first and second order information, denoted by ˆf(x), and move in the direction that minimizes this approximation, giving rise to a family of methods known as Newton methods (Kantorovich, 1948). A damped variant of the former applied to the minimization of a self-concordant function, converges globally, and shows quadratic local convergence when the iterates are close enough to the optimum (Nesterov & Nemirovskii, 1994). The global convergence of this method also extends to strongly convex and smooth function (Nesterov & Nemirovskii, 1994; Nesterov, 2013). Using a cubic regularized version of Newton's method, the global convergence of the method can also be extended to a broader class of functions than that of self-concordant or strongly convex and smooth functions (Nesterov & Polyak, 2006). When X R n is a convex set, one can use a constrained analog of these methods (Levitin & Polyak, 1966), where a quadratic approximation to the function is minimized over X at each iteration.

algorithm, convergence, socg algorithm, (13 more...)

arXiv.org Machine Learning

2002.08907

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)

Add feedback

Multi-output Polynomial Networks and Factorization Machines

Blondel, Mathieu, Niculae, Vlad, Otsuka, Takuma, Ueda, Naonori

Neural Information Processing SystemsDec-31-2017

Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.

artificial intelligence, basis vector, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Multi-output Polynomial Networks and Factorization Machines

Blondel, Mathieu, Niculae, Vlad, Otsuka, Takuma, Ueda, Naonori

arXiv.org Machine LearningNov-4-2017

Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1705.07603

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Filters

Collaborating Authors

conditional gradient algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Multi-output Polynomial Networks and Factorization Machines

Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data Abhishek Roy

Semi-Proximal Mirror-Prox for Nonsmooth Composite Minimization

dea9ddb25cbf2352cf4dec30222a02a5-Paper.pdf

Semi-Proximal Mirror-Prox for Nonsmooth Composite Minimization

Conditional Gradients for the Approximate Vanishing Ideal

Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data

Second-order Conditional Gradients

Multi-output Polynomial Networks and Factorization Machines

Multi-output Polynomial Networks and Factorization Machines