AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

On the Statistical Efficiency of $\ell_{1,p}$ Multi-Task Learning of Gaussian Graphical Models

Honorio, Jean, Jaakkola, Tommi, Samaras, Dimitris

arXiv.org Machine LearningOct-24-2015

We analyze the sufficient number of samples for the correct recovery of the support union and edge signs. We also analyze the necessary number of samples for any conceivable method by providing information-theoretic lower bounds. We compare the statistical efficiency of multi-task learning versus that of single-task learning. For experiments, we use a block coordinate descent method that is provably convergent and generates a sequence of positive definite solutions. We provide experimental validation on synthetic data as well as on two publicly available real-world data sets, including functional magnetic resonance imaging and gene expression data.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

1207.4255

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Dermatology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI

Ravishankar, Saiprasad, Bresler, Yoram

arXiv.org Machine LearningOct-22-2015

Natural signals and images are well-known to be approximately sparse in transform domains such as Wavelets and DCT. This property has been heavily exploited in various applications in image processing and medical imaging. Compressed sensing exploits the sparsity of images or image patches in a transform domain or synthesis dictionary to reconstruct images from undersampled measurements. In this work, we focus on blind compressed sensing, where the underlying sparsifying transform is a priori unknown, and propose a framework to simultaneously reconstruct the underlying image as well as the sparsifying transform from highly undersampled measurements. The proposed block coordinate descent type algorithms involve highly efficient optimal updates. Importantly, we prove that although the proposed blind compressed sensing formulations are highly nonconvex, our algorithms are globally convergent (i.e., they converge from any initialization) to the set of critical points of the objectives defining the formulations. These critical points are guaranteed to be at least partial global and partial local minimizers. The exact point(s) of convergence may depend on initialization. We illustrate the usefulness of the proposed framework for magnetic resonance image reconstruction from highly undersampled k-space measurements. As compared to previous methods involving the synthesis dictionary model, our approach is much faster, while also providing promising reconstruction quality.

artificial intelligence, data quality, machine learning, (20 more...)

arXiv.org Machine Learning

1501.02923

Country: North America > United States > Illinois (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Data Science > Data Quality > Data Transformation (0.45)

Add feedback

Generalized conditional gradient: analysis of convergence and applications

Rakotomamonjy, Alain, Flamary, Rémi, Courty, Nicolas

arXiv.org Machine LearningOct-22-2015

The objectives of this technical report is to provide additional results on the generalized conditional gradient methods introduced by Bredies et al. [BLM05]. Indeed , when the objective function is smooth, we provide a novel certificate of optimality and we show that the algorithm has a linear convergence rate. Applications of this algorithm are also discussed.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1510.06567

Country: Europe > France (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Similarity Learning for High-Dimensional Sparse Data

Liu, Kuan, Bellet, Aurélien, Sha, Fei

arXiv.org Machine LearningOct-21-2015

In many applications, such as text processing, computer vision or biology, data is represented as very highdimensional but sparse vectors. The ability to compute meaningful similarity scores between these objects is crucial to many tasks, such as classification, clustering or ranking. However, handcrafting a relevant similarity measure for such data is challenging because it is usually the case that only a small, often unknown subset of features is actually relevant to the task at hand. For instance, in drug discovery, chemical compounds can be represented as sparse features describing their 3D properties, and only a few of them play an role in determining whether the compound will bind to a target receptor (Guyon et al., 2004). In text classification, where each document is represented as a sparse bag of words, only a small subset of the words is generally sufficient to discriminate among documents of different topics. A principled way to obtain a similarity measure tailored to the problem of interest is to learn it from data. This line of research, known as similarity and distance metric learning, has been successfully applied to many application domains (see Kulis, 2012; Bellet et al., 2013, for recent surveys). The basic idea is to learn the parameters of a similarity (or distance) function such that it satisfies proximity-based constraints, requiring for instance that some data instance x be more similar to y than to z according to the learned function.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1411.2374

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Government > Military (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

Regularization vs. Relaxation: A conic optimization perspective of statistical variable selection

Dong, Hongbo, Chen, Kun, Linderoth, Jeff

arXiv.org Machine LearningOct-20-2015

Variable selection is a fundamental task in statistical data analysis. Sparsity-inducing regularization methods are a popular class of methods that simultaneously perform variable selection and model estimation. The central problem is a quadratic optimization problem with an l0-norm penalty. Exactly enforcing the l0-norm penalty is computationally intractable for larger scale problems, so dif- ferent sparsity-inducing penalty functions that approximate the l0-norm have been introduced. In this paper, we show that viewing the problem from a convex relaxation perspective offers new insights. In particular, we show that a popular sparsity-inducing concave penalty function known as the Minimax Concave Penalty (MCP), and the reverse Huber penalty derived in a recent work by Pilanci, Wainwright and Ghaoui, can both be derived as special cases of a lifted convex relaxation called the perspective relaxation. The optimal perspective relaxation is a related minimax problem that balances the overall convexity and tightness of approximation to the l0 norm. We show it can be solved by a semidefinite relaxation. Moreover, a probabilistic interpretation of the semidefinite relaxation reveals connections with the boolean quadric polytope in combinatorial optimization. Finally by reformulating the l0-norm pe- nalized problem as a two-level problem, with the inner level being a Max-Cut problem, our proposed semidefinite relaxation can be realized by replacing the inner level problem with its semidefinite relaxation studied by Goemans and Williamson. This interpretation suggests using the Goemans-Williamson rounding procedure to find approximate solutions to the l0-norm penalized problem. Numerical experiments demonstrate the tightness of our proposed semidefinite relaxation, and the effectiveness of finding approximate solutions by Goemans-Williamson rounding.

artificial intelligence, optimization problem, relaxation, (16 more...)

arXiv.org Machine Learning

1510.06083

Country:

North America > United States > Wisconsin (0.28)
North America > United States > Connecticut (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

Wu, Huasen, Srikant, R., Liu, Xin, Jiang, Chong

arXiv.org Machine LearningOct-19-2015

We study contextual bandits with budget and time constraints, referred to as constrained contextual bandits.The time and budget constraints significantly complicate the exploration and exploitation tradeoff because they introduce complex coupling among contexts over time.Such coupling effects make it difficult to obtain oracle solutions that assume known statistics of bandits. To gain insight, we first study unit-cost systems with known context distribution. When the expected rewards are known, we develop an approximation of the oracle, referred to Adaptive-Linear-Programming (ALP), which achieves near-optimality and only requires the ordering of expected rewards. With these highly desirable features, we then combine ALP with the upper-confidence-bound (UCB) method in the general case where the expected rewards are unknown {\it a priori}. We show that the proposed UCB-ALP algorithm achieves logarithmic regret except for certain boundary cases. Further, we design algorithms and obtain similar regret analysis results for more general systems with unknown context distribution and heterogeneous costs. To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits. Moreover, this work also sheds light on the study of computationally efficient algorithms for general constrained contextual bandits.

algorithm, optimization problem, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1504.06937

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.63)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Piecewise-Linear Approximation for Feature Subset Selection in a Sequential Logit Model

Sato, Toshiki, Takano, Yuichi, Miyashiro, Ryuhei

arXiv.org Machine LearningOct-19-2015

This paper concerns a method of selecting a subset of features for a sequential logit model. Tanaka and Nakagawa (2014) proposed a mixed integer quadratic optimization formulation for solving the problem based on a quadratic approximation of the logistic loss function. However, since there is a significant gap between the logistic loss function and its quadratic approximation, their formulation may fail to find a good subset of features. To overcome this drawback, we apply a piecewise-linear approximation to the logistic loss function. Accordingly, we frame the feature subset selection problem of minimizing an information criterion as a mixed integer linear optimization problem. The computational results demonstrate that our piecewise-linear approximation approach found a better subset of features than the quadratic approximation approach.

artificial intelligence, logistic loss function, machine learning, (13 more...)

arXiv.org Machine Learning

1510.05417

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment (0.50)
Education (0.46)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

An optimal randomized incremental gradient method

Lan, Guanghui, Zhou, Yi

arXiv.org Machine LearningOct-18-2015

In this paper, we consider a class of finite-sum convex optimization problems whose objective function is given by the summation of $m$ ($\ge 1$) smooth components together with some other relatively simple terms. We first introduce a deterministic primal-dual gradient (PDG) method that can achieve the optimal black-box iteration complexity for solving these composite optimization problems using a primal-dual termination criterion. Our major contribution is to develop a randomized primal-dual gradient (RPDG) method, which needs to compute the gradient of only one randomly selected smooth component at each iteration, but can possibly achieve better complexity than PDG in terms of the total number of gradient evaluations. More specifically, we show that the total number of gradient evaluations performed by RPDG can be ${\cal O} (\sqrt{m})$ times smaller, both in expectation and with high probability, than those performed by deterministic optimal first-order methods under favorable situations. We also show that the complexity of the RPDG method is not improvable by developing a new lower complexity bound for a general class of randomized methods for solving large-scale finite-sum convex optimization problems. Moreover, through the development of PDG and RPDG, we introduce a novel game-theoretic interpretation for these optimal methods for convex optimization.

artificial intelligence, gradient method, optimization problem, (17 more...)

arXiv.org Machine Learning

1507.02

Country: North America > United States > Florida > Alachua County > Gainesville (0.14)

Genre: Research Report (0.64)

Industry: Transportation (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Accelerating Optimization via Adaptive Prediction

Mohri, Mehryar, Yang, Scott

arXiv.org Machine LearningOct-13-2015

We present a powerful general framework for designing data-dependent optimization algorithms, building upon and unifying recent techniques in adaptive regularization, optimistic gradient predictions, and problem-dependent randomization. We first present a series of new regret guarantees that hold at any time and under very minimal assumptions, and then show how different relaxations recover existing algorithms, both basic as well as more recent sophisticated ones. Finally, we show how combining adaptivity, optimism, and problem-dependent randomization can guide the design of algorithms that benefit from more favorable guarantees than recent state-of-the-art methods.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1509.0576

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Functional Frank-Wolfe Boosting for General Loss Functions

Wang, Chu, Wang, Yingfei, E, Weinan, Schapire, Robert

arXiv.org Machine LearningOct-8-2015

Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using $l_1$ regularization. We propose a novel Frank-Wolfe type boosting algorithm (FWBoost) applied to general loss functions. By using exponential loss, the FWBoost algorithm can be rewritten as a variant of AdaBoost for binary classification. FWBoost algorithms have exactly the same form as existing boosting methods, in terms of making calls to a base learning algorithm with different weights update. This direct connection between boosting and Frank-Wolfe yields a new algorithm that is as practical as existing boosting methods but with new guarantees and rates of convergence. Experimental results show that the test performance of FWBoost is not degraded with larger rounds in boosting, which is consistent with the theoretical analysis.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1510.02558

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback