AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

An interior-point stochastic approximation method and an L1-regularized delta rule

Carbonetto, Peter, Schmidt, Mark, Freitas, Nando D.

Neural Information Processing SystemsFeb-15-2020, 01:13:43 GMT

The stochastic approximation method is behind the solution to many important, actively-studied problems in machine learning. Despite its far-reaching application, there is almost no work on applying stochastic approximation to learning problems with constraints. The reason for this, we hypothesize, is that no robust, widely-applicable stochastic approximation method exists for handling such problems. We propose that interior-point methods are a natural solution. We establish the stability of a stochastic interior-point approximation method both analytically and empirically, and demonstrate its utility by deriving an on-line learning algorithm that also performs feature selection via L1 regularization.

approximation method, interior-point stochastic approximation method, stochastic approximation method, (1 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Active Instance Sampling via Matrix Partition

Guo, Yuhong

Neural Information Processing SystemsFeb-15-2020, 01:11:48 GMT

Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural form of mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although the matrix partition is an NP-hard combinatorial optimization problem, we show a good local solution can be obtained by exploiting an effective local optimization technique on the relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models.

active learning approach, matrix partition, sampling, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fast global convergence rates of gradient methods for high-dimensional statistical recovery

Agarwal, Alekh, Negahban, Sahand, Wainwright, Martin J.

Neural Information Processing SystemsFeb-15-2020, 00:28:43 GMT

Many statistical $M$-estimators are based on convex optimization problems formed by the weighted sum of a loss function with a norm-based regularizer. We analyze the convergence rates of first-order gradient methods for solving such problems within a high-dimensional framework that allows the data dimension $d$ to grow with (and possibly exceed) the sample size $n$. This high-dimensional structure precludes the usual global assumptions---namely, strong convexity and smoothness conditions---that underlie classical optimization analysis. We define appropriately restricted versions of these conditions, and show that they are satisfied with high probability for various statistical models. Under these conditions, our theory guarantees that Nesterov's first-order method \cite{Nesterov07} has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical Euclidean distance between the true unknown parameter $\theta *$ and the optimal solution $\widehat{\theta}$.

fast global convergence rate, gradient method, high-dimensional statistical recovery, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.62)

Add feedback

Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

Luo, Dijun, Huang, Heng, Nie, Feiping, Ding, Chris H.

Neural Information Processing SystemsFeb-15-2020, 00:26:13 GMT

In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semi-supervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with convex optimization to forge'' a graph: with an input of a graph, to learn a graph with higher quality. Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. We develop a graph learning algorithm by solving a convex optimization problem and further develop an efficient optimization to obtain global optimal solutions with theoretical guarantees.

graph, low rank, positive semidefinite graph learning approach, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)

Add feedback

Multi-Task Multicriteria Hyperparameter Optimization

Akhmetzyanov, Kirill, Yuzhakov, Alexander

arXiv.org Machine LearningFeb-15-2020

We present a new method for searching optimal hyperparameters among several tasks and several criteria. Multi-Task Multi Criteria method (MTMC) provides several Pareto-optimal solutions, among which one solution is selected with given criteria significance coefficients. The article begins with a mathematical formulation of the problem of choosing optimal hyperparameters. Then, the steps of the MTMC method that solves this problem are described. The proposed method is evaluated on the image classification problem using a convolutional neural network. The article presents optimal hyperparameters for various criteria significance coefficients.

hyperparameter, lr 0, max lr 0, (12 more...)

arXiv.org Machine Learning

2002.06372

Country: Europe > Russia > Volga Federal District > Perm Krai (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

A Random-Feature Based Newton Method for Empirical Risk Minimization in Reproducing Kernel Hilbert Space

Chang, Ting-Jui, Shahrampour, Shahin

arXiv.org Machine LearningFeb-15-2020

In supervised learning using kernel methods, we encounter a large-scale finite-sum minimization over a reproducing kernel Hilbert space(RKHS). Often times large-scale finite-sum problems can be solved using efficient variants of Newton's method where the Hessian is approximated via sub-samples. In RKHS, however, the dependence of the penalty function to kernel makes standard sub-sampling approaches inapplicable, since the gram matrix is not readily available in a low-rank form. In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton's method. Focusing on randomized features for kernel approximation, we provide a novel second-order algorithm that enjoys local superlinear convergence and global convergence in the high probability sense. The key to our analysis is showing that the approximated Hessian via random features preserves the spectrum of the original Hessian. We provide numerical experiments verifying the efficiency of our approach, compared to variants of sub-sampling methods.

convergence, hessian, newton method, (17 more...)

arXiv.org Machine Learning

2002.04753

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Nearest Neighbor based Greedy Coordinate Descent

Dhillon, Inderjit S., Ravikumar, Pradeep K., Tewari, Ambuj

Neural Information Processing SystemsFeb-14-2020, 23:42:23 GMT

Increasingly, optimization problems in machine learning, especially those arising from high-dimensional statistical estimation, have a large number of variables. Modern statistical estimators developed over the past decade have statistical or sample complexity that depends only weakly on the number of parameters when there is some structure to the problem, such as sparsity. A central question is whether similar advances can be made in their computational complexity as well. In this paper, we propose strategies that indicate that such advances can indeed be made. In particular, we investigate the greedy coordinate descent algorithm, and note that performing the greedy step efficiently weakens the costly dependence on the problem size provided the solution is sparse.

greedy coordinate descent, greedy descent, nearest neighbor, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.41)

Add feedback

Approximating Concavely Parameterized Optimization Problems

Giesen, Joachim, Mueller, Jens, Laue, Soeren, Swiercy, Sascha

Neural Information Processing SystemsFeb-14-2020, 23:27:57 GMT

We consider an abstract class of optimization problems that are parameterized concavely in a single parameter, and show that the solution path along the parameter can always be approximated with accuracy $\varepsilon 0$ by a set of size $O(1/\sqrt{\varepsilon})$. A lower bound of size $\Omega (1/\sqrt{\varepsilon})$ shows that the upper bound is tight up to a constant factor. We also devise an algorithm that calls a step-size oracle and computes an approximate path of size $O(1/\sqrt{\varepsilon})$. Finally, we provide an implementation of the oracle for soft-margin support vector machines, and a parameterized semi-definite program for matrix completion. Papers published at the Neural Information Processing Systems Conference.

approximating concavely parameterized optimization problem, sqrt, varepsilon

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

Agarwal, Alekh, Negahban, Sahand, Wainwright, Martin J.

Neural Information Processing SystemsFeb-14-2020, 22:58:16 GMT

We develop and analyze stochastic optimization algorithms for problems in which the expected loss is strongly convex, and the optimum is (approximately) sparse. Previous approaches are able to exploit only one of these two structures, yielding a $\order(\pdim/T)$ convergence rate for strongly convex objectives in $\pdim$ dimensions and $\order(\sqrt{\spindex( \log\pdim)/T})$ convergence rate when the optimum is $\spindex$-sparse. Our algorithm is based on successively solving a series of $\ell_1$-regularized optimization problems using Nesterov's dual averaging algorithm. We establish that the error of our solution after $T$ iterations is at most $\order(\spindex(\log\pdim)/T)$, with natural extensions to approximate sparsity. Our results apply to locally Lipschitz losses including the logistic, exponential, hinge and least-squares losses.

algorithm, optimization and sparse statistical recovery, stochastic optimization and sparse, (5 more...)

Neural Information Processing Systems

Genre: Research Report (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)

Add feedback

Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

Wibisono, Andre, Wainwright, Martin J., Jordan, Michael I., Duchi, John C.

Neural Information Processing SystemsFeb-14-2020, 22:56:55 GMT

We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates. We show that if pairs of function values are available, algorithms that use gradient estimates based on random perturbations suffer a factor of at most $\sqrt{\dim}$ in convergence rate over traditional stochastic gradient methods, where $\dim$ is the dimension of the problem. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, which show that our bounds are sharp with respect to all problem-dependent quantities: they cannot be improved by more than constant factors. Papers published at the Neural Information Processing Systems Conference.

algorithm, finite sample convergence rate, zero-order stochastic optimization method

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.77)

Add feedback