AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Recursive Decomposition for Nonconvex Optimization

Friesen, Abram L. (University of Washington) | Domingos, Pedro (University of Washington)

AAAI ConferencesJul-15-2015

Continuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most real-world optimization problems are nonconvex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observe that, in many cases, the local modes of the objective function have combinatorial structure, and thus ideas from combinatorial optimization can be brought to bear. Based on this, we propose a problem-decomposition approach to nonconvex optimization. Similarly to DPLL-style SAT solvers and recursive conditioning in probabilistic inference, our algorithm, RDIS, recursively sets variables so as to simplify and decompose the objective function into approximately independent sub-functions, until the remaining functions are simple enough to be optimized by standard techniques like gradient descent. The variables to set are chosen by graph partitioning, ensuring decomposition whenever possible. We show analytically that RDIS can solve a broad class of nonconvex optimization problems exponentially faster than gradient descent with random restarts. Experimentally, RDIS outperforms standard techniques on problems like structure from motion and protein folding.

decomposition, rdis, subspace optimizer, (14 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: North America > United States > Washington > King County > Seattle (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis

Scheinberg, Katya, Tang, Xiaocheng

arXiv.org Machine LearningJul-14-2015

Recently several methods were proposed for sparse optimization which make careful use of second-order information [10, 28, 16, 3] to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian information, optimize this approximation using a first-order method, such as coordinate descent and employ a line search to ensure sufficient descent. Here we propose a general framework, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provide a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent.

artificial intelligence, iteration, machine learning, (14 more...)

arXiv.org Machine Learning

1311.6547

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Tensor principal component analysis via sum-of-squares proofs

Hopkins, Samuel B., Shi, Jonathan, Steurer, David

arXiv.org Machine LearningJul-12-2015

We study a statistical model for the tensor principal component analysis problem introduced by Montanari and Richard: Given a order-$3$ tensor $T$ of the form $T = \tau \cdot v_0^{\otimes 3} + A$, where $\tau \geq 0$ is a signal-to-noise ratio, $v_0$ is a unit vector, and $A$ is a random noise tensor, the goal is to recover the planted vector $v_0$. For the case that $A$ has iid standard Gaussian entries, we give an efficient algorithm to recover $v_0$ whenever $\tau \geq \omega(n^{3/4} \log(n)^{1/4})$, and certify that the recovered vector is close to a maximum likelihood estimator, all with high probability over the random choice of $A$. The previous best algorithms with provable guarantees required $\tau \geq \Omega(n)$. In the regime $\tau \leq o(n)$, natural tensor-unfolding-based spectral relaxations for the underlying optimization problem break down (in the sense that their integrality gap is large). To go beyond this barrier, we use convex relaxations based on the sum-of-squares method. Our recovery algorithm proceeds by rounding a degree-$4$ sum-of-squares relaxations of the maximum-likelihood-estimation problem for the statistical model. To complement our algorithmic results, we show that degree-$4$ sum-of-squares relaxations break down for $\tau \leq O(n^{3/4}/\log(n)^{1/4})$, which demonstrates that improving our current guarantees (by more than logarithmic factors) would require new techniques or might even be intractable. Finally, we show how to exploit additional problem structure in order to solve our sum-of-squares relaxations, up to some approximation, very efficiently. Our fastest algorithm runs in nearly-linear time using shifted (matrix) power iteration and has similar guarantees as above. The analysis of this algorithm also confirms a variant of a conjecture of Montanari and Richard about singular vectors of tensor unfoldings.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

1507.03269

Country: North America > United States (0.45)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.60)

Add feedback

Homotopy Continuation Approaches for Robust SV Classification and Regression

Suzumura, Shinya, Ogawa, Kohei, Sugiyama, Masashi, Karasuyama, Masayuki, Takeuchi, Ichiro

arXiv.org Machine LearningJul-12-2015

In support vector machine (SVM) applications with unreliable data that contains a portion of outliers, non-robustness of SVMs often causes considerable performance deterioration. Although many approaches for improving the robustness of SVMs have been studied, two major challenges remain in robust SVM learning. First, robust learning algorithms are essentially formulated as non-convex optimization problems. It is thus important to develop a non-convex optimization method for robust SVM that can find a good local optimal solution. The second practical issue is how one can tune the hyperparameter that controls the balance between robustness and efficiency. Unfortunately, due to the non-convexity, robust SVM solutions with slightly different hyper-parameter values can be significantly different, which makes model selection highly unstable. In this paper, we address these two issues simultaneously by introducing a novel homotopy approach to non-convex robust SVM learning. Our basic idea is to introduce parametrized formulations of robust SVM which bridge the standard SVM and fully robust SVM via the parameter that represents the influence of outliers. We characterize the necessary and sufficient conditions of the local optimal solutions of robust SVM, and develop an algorithm that can trace a path of local optimal solutions when the influence of outliers is gradually decreased. An advantage of our homotopy approach is that it can be interpreted as simulated annealing, a common approach for finding a good local optimal solution in non-convex optimization problems. In addition, our homotopy method allows stable and efficient model selection based on the path of local optimal solutions. Empirical performances of the proposed approach are demonstrated through intensive numerical experiments both on robust classification and regression problems.

artificial intelligence, machine learning, optimal solution, (15 more...)

arXiv.org Machine Learning

1507.03229

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Add feedback

Best Subset Selection via a Modern Optimization Lens

Bertsimas, Dimitris, King, Angela, Mazumder, Rahul

arXiv.org Machine LearningJul-11-2015

In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing $k$ out of $p$ features in linear regression given $n$ observations. We develop a discrete extension of modern first order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with $n$ in the 1000s and $p$ in the 100s in minutes to provable optimality, and finds near optimal solutions for $n$ in the 100s and $p$ in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.

artificial intelligence, lasso, machine learning, (17 more...)

arXiv.org Machine Learning

1507.03133

Country:

North America > United States (0.46)
Europe (0.27)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Analysis of Microarray Data using Artificial Intelligence Based Techniques

Raza, Khalid

arXiv.org Artificial IntelligenceJul-10-2015

The bioinformatics is an interdisciplinary area of study where one of the objectives is to deal with the analysis and interpretation of large sets of data generated from various large-scale biological experiments. The example of one such large-scale biological experiment is measuring the expression levels of tens of thousands of genes simultaneously under some environmental condition. Microarray is one of the essential technologies used by the biologist to measure genome-wide expression levels of genes in a particular organism. As microarrays technologies have become more prevalent, the challenges 1 associated with collecting, managing, and analyzing the data from each experiment have essentially increased. Robust laboratory protocols, improved understanding of the complex experimental design and falling prices of commercial platforms, all these have combined to drive the field to more complex experiments, generating huge amounts of data (Brazma and Vilo, 2000).

bioinformatics, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.4018/978-1-5225-0427-6.ch011

1507.0287

Country: Asia > India (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(5 more...)

Add feedback

An Extragradient-Based Alternating Direction Method for Convex Minimization

Lin, Tianyi, Ma, Shiqian, Zhang, Shuzhong

arXiv.org Machine LearningJul-8-2015

In this paper, we consider the problem of minimizing the sum of two convex functions subject to linear linking constraints. The classical alternating direction type methods usually assume that the two convex functions have relatively easy proximal mappings. However, many problems arising from statistics, image processing and other fields have the structure that while one of the two functions has easy proximal mapping, the other function is smoothly convex but does not have an easy proximal mapping. Therefore, the classical alternating direction methods cannot be applied. To deal with the difficulty, we propose in this paper an alternating direction method based on extragradients. Under the assumption that the smooth function has a Lipschitz continuous gradient, we prove that the proposed method returns an $\epsilon$-optimal solution within $O(1/\epsilon)$ iterations. We apply the proposed method to solve a new statistical model called fused logistic regression. Our numerical experiments show that the proposed method performs very well when solving the test problems. We also test the performance of the proposed method through solving the lasso problem arising from statistics and compare the result with several existing efficient solvers for this problem; the results are very encouraging indeed.

artificial intelligence, machine learning, proximal mapping, (14 more...)

arXiv.org Machine Learning

1301.6308

Genre: Research Report (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Noisy Matrix Decomposition

Sedghi, Hanie, Anandkumar, Anima, Jonckheere, Edmond

arXiv.org Machine LearningJul-6-2015

We propose an efficient ADMM method with guarantees for high-dimensional problems. We provide explicit bounds for the sparse optimization problem and the noisy matrix decomposition problem. For sparse optimization, we establish that the modified ADMM method has an optimal convergence rate of $\mathcal{O}(s\log d/T)$, where $s$ is the sparsity level, $d$ is the data dimension and $T$ is the number of steps. This matches with the minimax lower bounds for sparse estimation. For matrix decomposition into sparse and low rank components, we provide the first guarantees for any online method, and prove a convergence rate of $\tilde{\mathcal{O}}((s+r)\beta^2(p) /T) + \mathcal{O}(1/p)$ for a $p\times p$ matrix, where $s$ is the sparsity level, $r$ is the rank and $\Theta(\sqrt{p})\leq \beta(p)\leq \Theta(p)$. Our guarantees match the minimax lower bound with respect to $s,r$ and $T$. In addition, we match the minimax lower bound with respect to the matrix dimension $p$, i.e. $\beta(p)=\Theta(\sqrt{p})$, for many important statistical models including the independent noise model, the linear Bayesian network and the latent Gaussian graphical model under some conditions. Our ADMM method is based on epoch-based annealing and consists of inexpensive steps which involve projections on to simple norm balls. Experiments show that for both sparse optimization and matrix decomposition problems, our algorithm outperforms the state-of-the-art methods. In particular, we reach higher accuracy with same time complexity.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1402.5131

Country: North America > United States > California (0.28)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.74)
(2 more...)

Add feedback

Towards Faster Rates and Oracle Property for Low-Rank Matrix Estimation

Gui, Huan, Gu, Quanquan

arXiv.org Machine LearningJul-6-2015

We present a unified framework for low-rank matrix estimation with nonconvex penalties. We first prove that the proposed estimator attains a faster statistical rate than the traditional low-rank matrix estimator with nuclear norm penalty. Moreover, we rigorously show that under a certain condition on the magnitude of the nonzero singular values, the proposed estimator enjoys oracle property (i.e., exactly recovers the true rank of the matrix), besides attaining a faster rate. As far as we know, this is the first work that establishes the theory of low-rank matrix estimation with nonconvex penalties, confirming the advantages of nonconvex penalties for matrix completion. Numerical experiments on both synthetic and real world datasets corroborate our theory.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

1505.0478

Country: North America > United States > Virginia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)

Add feedback

Semiblind Hyperspectral Unmixing in the Presence of Spectral Library Mismatches

Fu, Xiao, Ma, Wing-Kin, Bioucas-Dias, José, Chan, Tsung-Han

arXiv.org Machine LearningJul-6-2015

The dictionary-aided sparse regression (SR) approach has recently emerged as a promising alternative to hyperspectral unmixing (HU) in remote sensing. By using an available spectral library as a dictionary, the SR approach identifies the underlying materials in a given hyperspectral image by selecting a small subset of spectral samples in the dictionary to represent the whole image. A drawback with the current SR developments is that an actual spectral signature in the scene is often assumed to have zero mismatch with its corresponding dictionary sample, and such an assumption is considered too ideal in practice. In this paper, we tackle the spectral signature mismatch problem by proposing a dictionary-adjusted nonconvex sparsity-encouraging regression (DANSER) framework. The main idea is to incorporate dictionary correcting variables in an SR formulation. A simple and low per-iteration complexity algorithm is tailor-designed for practical realization of DANSER. Using the same dictionary correcting idea, we also propose a robust subspace solution for dictionary pruning. Extensive simulations and real-data experiments show that the proposed method is effective in mitigating the undesirable spectral signature mismatch effects.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

doi: 10.1109/TGRS.2016.2557340

1507.01661

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Industry: Government > Regional Government > North America Government > United States Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback