Optimization
Efficient Methods for Overlapping Group Lasso
Yuan, Lei, Liu, Jun, Ye, Jieping
The group Lasso is an extension of the Lasso for feature selection on (predefined) non-overlapping groups of features. The non-overlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation, where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efficient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. We have performed empirical evaluations using both synthetic and the breast cancer gene expression data set, which consists of 8,141 genes organized into (overlapping) gene sets. Experimental results show that the proposed algorithm is more efficient than existing state-of-the-art algorithms.
Thinning Measurement Models and Questionnaire Design
Inferring key unobservable features of individuals is an important task in the applied sciences. In particular, an important source of data in fields such as marketing, social sciences and medicine is questionnaires: answers in such questionnaires are noisy measures of target unobserved features. While comprehensive surveys help to better estimate the latent variables of interest, aiming at a high number of questions comes at a price: refusal to participate in surveys can go up, as well as the rate of missing data; quality of answers can decline; costs associated with applying such questionnaires can also increase. In this paper, we cast the problem of refining existing models for questionnaire data as follows: solve a constrained optimization problem of preserving the maximum amount of information found in a latent variable model using only a subset of existing questions. The goal is to find an optimal subset of a given size. For that, we first define an information theoretical measure for quantifying the quality of a reduced questionnaire. Three different approximate inference methods are introduced to solve this problem. Comparisons against a simple but powerful heuristic are presented.
Multiple Instance Learning on Structured Data
Zhang, Dan, Liu, Yan, Si, Luo, Zhang, Jian, Lawrence, Richard D.
Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of Concave-Convex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i.e., webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods.
Penalty Decomposition Methods for Rank Minimization
In this paper we consider general rank minimization problems with rank appearing ineither objective function or constraint. We first show that a class of matrix optimization problems can be solved as lower dimensional vector optimization problems. As a consequence, we establish that a class of rank minimization problems haveclosed form solutions. Using this result, we then propose penalty decomposition methodsfor general rank minimization problems. The convergence results of the PD methods have been shown in the longer version of the paper [19]. Finally, we test the performance of our methods by applying them to matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.
Analytical and Learning-Based Spectrum Sensing Time Optimization in Cognitive Radio Systems
Shokri-Ghadikolaei, Hossein, Abdi, Younes, Nasiri-Kenari, Masoumeh
Powerful spectrum sensing schemes enable cognitive radios (CRs) to find transmission opportunities in spectral resources allocated exclusively to the primary users. In this paper, maximizing the average throughput of a secondary user by optimizing its spectrum sensing time is formulated assuming that a prior knowledge of the presence and absence probabilities of the primary users is available. The energy consumed for finding a transmission opportunity is evaluated and a discussion on the impact of the number of the primary users on the secondary user throughput and consumed energy is presented. In order to avoid the challenges associated with the analytical method, as a second solution, a systematic neural network-based sensing time optimization approach is also proposed in this paper. The proposed adaptive scheme is able to find the optimum value of the channel sensing time without any prior knowledge or assumption about the wireless environment. The structure, performance, and cooperation of the artificial neural networks used in the proposed method are disclosed in detail and a set of illustrative simulation results is presented to validate the analytical results as well as the performance of the proposed learning-based optimization scheme.
Convex Optimization without Projection Steps
For the general problem of minimizing a convex function over a compact convex domain, we will investigate a simple iterative approximation algorithm based on the method by Frank & Wolfe 1956, that does not need projection steps in order to stay inside the optimization domain. Instead of a projection step, the linearized problem defined by a current subgradient is solved, which gives a step direction that will naturally stay in the domain. Our framework generalizes the sparse greedy algorithm of Frank & Wolfe and its primal-dual analysis by Clarkson 2010 (and the low-rank SDP approach by Hazan 2008) to arbitrary convex domains. We give a convergence proof guaranteeing {\epsilon}-small duality gap after O(1/{\epsilon}) iterations. The method allows us to understand the sparsity of approximate solutions for any l1-regularized convex optimization problem (and for optimization over the simplex), expressed as a function of the approximation quality. We obtain matching upper and lower bounds of {\Theta}(1/{\epsilon}) for the sparsity for l1-problems. The same bounds apply to low-rank semidefinite optimization with bounded trace, showing that rank O(1/{\epsilon}) is best possible here as well. As another application, we obtain sparse matrices of O(1/{\epsilon}) non-zero entries as {\epsilon}-approximate solutions when optimizing any convex function over a class of diagonally dominant symmetric matrices. We show that our proposed first-order method also applies to nuclear norm and max-norm matrix optimization problems. For nuclear norm regularized optimization, such as matrix completion and low-rank recovery, we demonstrate the practical efficiency and scalability of our algorithm for large matrix problems, as e.g. the Netflix dataset. For general convex optimization over bounded matrix max-norm, our algorithm is the first with a convergence guarantee, to the best of our knowledge.
Stochastic Enforced Hill-Climbing
Wu, J., Kalyanam, R., Givan, R.
Enforced hill-climbing is an effective deterministic hill-climbing technique that deals with local optima using breadth-first search (a process called ``basin flooding''). We propose and evaluate a stochastic generalization of enforced hill-climbing for online use in goal-oriented probabilistic planning problems. We assume a provided heuristic function estimating expected cost to the goal with flaws such as local optima and plateaus that thwart straightforward greedy action choice. While breadth-first search is effective in exploring basins around local optima in deterministic problems, for stochastic problems we dynamically build and solve a heuristic-based Markov decision process (MDP) model of the basin in order to find a good escape policy exiting the local optimum. We note that building this model involves integrating the heuristic into the MDP problem because the local goal is to improve the heuristic. We evaluate our proposal in twenty-four recent probabilistic planning-competition benchmark domains and twelve probabilistically interesting problems from recent literature. For evaluation, we show that stochastic enforced hill-climbing (SEH) produces better policies than greedy heuristic following for value/cost functions derived in two very different ways: one type derived by using deterministic heuristics on a deterministic relaxation and a second type derived by automatic learning of Bellman-error features from domain-specific experience. Using the first type of heuristic, SEH is shown to generally outperform all planners from the first three international probabilistic planning competitions.
Scenario trees and policy selection for multistage stochastic programming using machine learning
Defourny, Boris, Ernst, Damien, Wehenkel, Louis
Stochastic optimization using scenario trees has proven to be a powerful algorithmic strategy, but has suffered from the rapid growth in the size of scenario trees as the number of stages grows (Birge and Louveaux, 1997; Shapiro et al., 2009). A number of authors have undertaken research to limit the size of the scenario tree, but problem size still grows exponentially with the number of stages (Frauendorfer, 1996; Dupacova et al., 2000; Høyland and Wallace, 2001; Pennanen, 2009; Heitsch and Römisch, 2009). As a result, most authors either severely limit the number of decision stages or sharply limit the number of scenarios per stage (Birge, 1997; Wallace and Ziemba, 2005; Dempster et al., 2008; Kallrath et al., 2009). Such approximations make it possible to optimize first-stage decisions with a stochastic look-ahead, but without tight guarantees on the value of the computed decisions for the true multistage problem (as a matter of fact, bounding techniques also tend to break down on problems with many stages). Some authors have proposed to assess the quality of scenario-tree based methods by outof-sample validation (Kouwenberg, 2001; Chiralaksanakul, 2003; Hilli and Pennanen, 2008). The validation scheme consists of solving the multistage program posed on a scenario tree spanning the planning horizon T, implementing the decision relative to time step 1, sampling the realization of the stochastic process at time 1, updating the conditional distributions of the stochastic process from time 2 to time T, rebuilding a scenario tree spanning time periods 2 to T, solving the new multistage program over the remaining horizon (with previously 1 implemented decisions fixed to their value), and continuing this process until the last decision at time T has been found.
Structured Sparsity via Alternating Direction Methods
We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm which incorporates prior knowledge of the group structure of the features. Such problems often pose a considerable challenge to optimization algorithms due to the non-smoothness and non-separability of the regularization term. In this paper, we focus on two commonly adopted sparsity-inducing regularization terms, the overlapping Group Lasso penalty $l_1/l_2$-norm and the $l_1/l_\infty$-norm. We propose a unified framework based on the augmented Lagrangian method, under which problems with both types of regularization and their variants can be efficiently solved. As the core building-block of this framework, we develop new algorithms using an alternating partial-linearization/splitting technique, and we prove that the accelerated versions of these algorithms require $O(\frac{1}{\sqrt{\epsilon}})$ iterations to obtain an $\epsilon$-optimal solution. To demonstrate the efficiency and relevance of our algorithms, we test them on a collection of data sets and apply them to two real-world problems to compare the relative merits of the two norms.
Truncated Power Method for Sparse Eigenvalue Problems
The sparsity is controlled by the values of k and can be viewed as a design parameter. In machine learning applications, e.g., principal component analysis, this problem is motivated from the following perturbation formulation of matrix A: A Ā E, (1.2) where A is the empirical covariance matrix, Ā is the true covariance matrix, and E is a random perturbation due to having only a finite number of empirical samples. If we assume that the largest eigenvector x of Ā is sparse, then a natural question is to recover x from the noisy observation A when the error E is "small". In this context, the problem (1.1) is also referred to as sparse principal component analysis (sparse PCA). 1 In general, problem (1.1) is non-convex. In fact, it is also NPhard because it can be reduced to the subset selection problem for ordinary least squares regression (Moghaddam et al., 2006), which is known to be NP hard.