Optimization
Efficient Algorithm for Extremely Large Multi-task Regression with Massive Structured Sparsity
We develop a highly scalable optimization method called "hierarchical group-thresholding" for solving a multi-task regression model with complex structured sparsity constraints on both input and output spaces. Despite the recent emergence of several efficient optimization algorithms for tackling complex sparsity-inducing regularizers, true scalability in practical high-dimensional problems where a huge amount (e.g., millions) of sparsity patterns need to be enforced remains an open challenge, because all existing algorithms must deal with ALL such patterns exhaustively in every iteration, which is computationally prohibitive. Our proposed algorithm addresses the scalability problem by screening out multiple groups of coefficients simultaneously and systematically. We employ a hierarchical tree representation of group constraints to accelerate the process of removing irrelevant constraints by taking advantage of the inclusion relationships between group sparsities, thereby avoiding dealing with all constraints in every optimization step, and necessitating optimization operation only on a small number of outstanding coefficients. In our experiments, we demonstrate the efficiency of our method on simulation datasets, and in an application of detecting genetic variants associated with gene expression traits.
How to sample if you must: on optimal functional sampling
We examine a fundamental problem that models various active sampling setups, such as network tomography. We analyze sampling of a multivariate normal distribution with an unknown expectation that needs to be estimated: in our setup it is possible to sample the distribution from a given set of linear functionals, and the difficulty addressed is how to optimally select the combinations to achieve low estimation error. Although this problem is in the heart of the field of optimal design, no efficient solutions for the case with many functionals exist. We present some bounds and an efficient sub-optimal solution for this problem for more structured sets such as binary functionals that are induced by graph walks.
Toward an Integrated Framework for Automated Development and Optimization of Online Advertising Campaigns
Thomaidou, Stamatina, Vazirgiannis, Michalis, Liakopoulos, Kyriakos
Creating and monitoring competitive and cost-effective pay-per-click advertisement campaigns through the web-search channel is a resource demanding task in terms of expertise and effort. Assisting or even automating the work of an advertising specialist will have an unrivaled commercial value. In this paper we propose a methodology, an architecture, and a fully functional framework for semi- and fully- automated creation, monitoring, and optimization of cost-efficient pay-per-click campaigns with budget constraints. The campaign creation module generates automatically keywords based on the content of the web page to be advertised extended with corresponding ad-texts. These keywords are used to create automatically the campaigns fully equipped with the appropriate values set. The campaigns are uploaded to the auctioneer platform and start running. The optimization module focuses on the learning process from existing campaign statistics and also from applied strategies of previous periods in order to invest optimally in the next period. The objective is to maximize the performance (i.e. clicks, actions) under the current budget constraint. The fully functional prototype is experimentally evaluated on real world Google AdWords campaigns and presents a promising behavior with regards to campaign performance statistics as it outperforms systematically the competing manually maintained campaigns.
Fast Planar Correlation Clustering for Image Segmentation
Yarkony, Julian, Ihler, Alexander T., Fowlkes, Charless C.
We describe a new optimization scheme for finding high-quality correlation clusterings in planar graphs that uses weighted perfect matching as a subroutine. Our method provides lower-bounds on the energy of the optimal correlation clustering that are typically fast to compute and tight in practice. We demonstrate our algorithm on the problem of image segmentation where this approach outperforms existing global optimization techniques in minimizing the objective and is competitive with the state of the art in producing high-quality segmentations.
Ergodic Mirror Descent
Duchi, John C., Agarwal, Alekh, Johansson, Mikael, Jordan, Michael I.
We generalize stochastic subgradient descent methods to situations in which we do not receive independent samples from the distribution over which we optimize, but instead receive samples that are coupled over time. We show that as long as the source of randomness is suitably ergodic---it converges quickly enough to a stationary distribution---the method enjoys strong convergence guarantees, both in expectation and with high probability. This result has implications for stochastic optimization in high-dimensional spaces, peer-to-peer distributed optimization schemes, decision problems with dependent data, and stochastic optimization problems over combinatorial spaces.
Tractable Triangles and Cross-Free Convexity in Discrete Optimisation
The minimisation problem of a sum of unary and pairwise functions of discrete variables is a general NP-hard problem with wide applications such as computing MAP configurations in Markov Random Fields (MRF), minimising Gibbs energy, or solving binary Valued Constraint Satisfaction Problems (VCSPs). We study the computational complexity of classes of discrete optimisation problems given by allowing only certain types of costs in every triangle of variable-value assignments to three distinct variables. We show that for several computational problems, the only non- trivial tractable classes are the well known maximum matching problem and the recently discovered joint-winner property. Our results, apart from giving complete classifications in the studied cases, provide guidance in the search for hybrid tractable classes; that is, classes of problems that are not captured by restrictions on the functions (such as submodularity) or the structure of the problem graph (such as bounded treewidth). Furthermore, we introduce a class of problems with convex cardinality functions on cross-free sets of assignments. We prove that while imposing only one of the two conditions renders the problem NP-hard, the conjunction of the two gives rise to a novel tractable class satisfying the cross-free convexity property, which generalises the joint-winner property to problems of unbounded arity.
Fast global convergence of gradient methods for high-dimensional statistical recovery
Agarwal, Alekh, Negahban, Sahand N., Wainwright, Martin J.
Many statistical $M$-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-dimensional framework that allows the data dimension $\pdim$ to grow with (and possibly exceed) the sample size $\numobs$. This high-dimensional structure precludes the usual global assumptions---namely, strong convexity and smoothness conditions---that underlie much of classical optimization analysis. We define appropriately restricted versions of these conditions, and show that they are satisfied with high probability for various statistical models. Under these conditions, our theory guarantees that projected gradient descent has a globally geometric rate of convergence up to the \emph{statistical precision} of the model, meaning the typical distance between the true unknown parameter $\theta^*$ and an optimal solution $\hat{\theta}$. This result is substantially sharper than previous convergence results, which yielded sublinear convergence, or linear convergence only up to the noise level. Our analysis applies to a wide range of $M$-estimators and statistical models, including sparse linear regression using Lasso ($\ell_1$-regularized regression); group Lasso for block sparsity; log-linear models with regularization; low-rank matrix recovery using nuclear norm regularization; and matrix decomposition. Overall, our analysis reveals interesting connections between statistical precision and computational efficiency in high-dimensional estimation.
A note on the lack of symmetry in the graphical lasso
Rolfs, Benjamin T., Rajaratnam, Bala
The graphical lasso (glasso) is a widely-used fast algorithm for estimating sparse inverse covariance matrices. The glasso solves an L1 penalized maximum likelihood problem and is available as an R library on CRAN. The output from the glasso, a regularized covariance matrix estimate a sparse inverse covariance matrix estimate, not only identify a graphical model but can also serve as intermediate inputs into multivariate procedures such as PCA, LDA, MANOVA, and others. The glasso indeed produces a covariance matrix estimate which solves the L1 penalized optimization problem in a dual sense; however, the method for producing the inverse covariance matrix estimator after this optimization is inexact and may produce asymmetric estimates. This problem is exacerbated when the amount of L1 regularization that is applied is small, which in turn is more likely to occur if the true underlying inverse covariance matrix is not sparse. The lack of symmetry can potentially have consequences. First, it implies that the covariance and inverse covariance estimates are not numerical inverses of one another, and second, asymmetry can possibly lead to negative or complex eigenvalues,rendering many multivariate procedures which may depend on the inverse covariance estimator unusable. We demonstrate this problem, explain its causes, and propose possible remedies.
Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case
Maes, Francis, Ernst, Damien, Wehenkel, Louis
The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy. To address a specific class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average performance over a sample of problems drawn from the prior distribution. We illustrate this meta-learning approach with two different hypothesis spaces: one where E/E strategies are numerically parameterized and another where E/E strategies are represented as small symbolic formulas. We propose appropriate optimization algorithms for both cases. Our experiments, with two-armed Bernoulli bandit problems and various playing budgets, show that the meta-learnt E/E strategies outperform generic strategies of the literature (UCB1, UCB1-Tuned, UCB-v, KL-UCB and epsilon greedy); they also evaluate the robustness of the learnt E/E strategies, by tests carried out on arms whose rewards follow a truncated Gaussian distribution.
Last-Mile Restoration for Multiple Interdependent Infrastructures
Coffrin, Carleton (Brown University) | Hentenryck, Pascal Van (NICTA) | Bent, Russell (Los Alamos National Laboratory)
This paper considers the restoration of multiple interdependent infrastructures after a man-made or natural disaster. Modern infrastructures feature complex cyclic interdependencies and require a holistic restoration process. This paper presents the first scalable approach for the last-mile restoration of the joint electrical power and gas infrastructures. It builds on an earlier three-stage decomposition for restoring the power network that decouples the restoration ordering and the routing aspects. The key contributions of the paper are (1) mixed-integer programming models for finding a minimal restoration set and a restoration ordering and (2) a randomized adaptive decomposition to obtain high-quality solutions within the required time constraints. The approach is validated on a large selection of benchmarks based on the United States infrastructures and state-of-the-art weather and fragility simulation tools. The results show significant improvements over current field practices.