Optimization
An (MI)LP-based Primal Heuristic for 3-Architecture Connected Facility Location in Urban Access Network Design
D'Andreagiovanni, Fabio, Mett, Fabian, Pulaj, Jonad
We investigate the 3-architecture Connected Facility Location Problem arising in the design of urban telecommunication access networks integrating wired and wireless technologies. We propose an original optimization model for the problem that includes additional variables and constraints to take into account wireless signal coverage represented through signal-to-interference ratios. Since the problem can prove very challenging even for modern state-of-the art optimization solvers, we propose to solve it by an original primal heuristic that combines a probabilistic fixing procedure, guided by peculiar Linear Programming relaxations, with an exact MIP heuristic, based on a very large neighborhood search. Computational experiments on a set of realistic instances show that our heuristic can find solutions associated with much lower optimality gaps than a state-of-the-art solver. Keywords: Telecommunications, FTTX Access Networks, Connected Facility Location, Mixed Integer Linear Programming, Tight Linear Relaxations, MIP Heuristics.
SOFAR: large-scale association network learning
Uematsu, Yoshimasa, Fan, Yingying, Chen, Kun, Lv, Jinchi, Lin, Wei
Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulation and real data examples.
Improving the Efficiency of Dynamic Programming on Tree Decompositions via Machine Learning
Abseher, Michael, Musliu, Nysret, Woltran, Stefan
Dynamic Programming (DP) over tree decompositions is a well-established method to solve problems - that are in general NP-hard - efficiently for instances of small treewidth. Experience shows that (i) heuristically computing a tree decomposition has negligible runtime compared to the DP step; and (ii) DP algorithms exhibit a high variance in runtime when using different tree decompositions; in fact, given an instance of the problem at hand, even decompositions of the same width might yield extremely diverging runtimes. We thus propose here a novel and general method that is based on selection of the best decomposition from an available pool of heuristically generated ones. For this purpose, we require machine learning techniques that provide automated selection based on features of the decomposition rather than on the actual problem instance. Thus, one main contribution of this work is to propose novel features for tree decompositions. Moreover, we report on extensive experiments in different problem domains which show a significant speedup when choosing the tree decomposition according to this concept over simply using an arbitrary one of the same width.
Learning from Comparisons and Choices
Negahban, Sahand, Oh, Sewoong, Thekumparampil, Kiran K., Xu, Jiaming
When tracking user-specific online activities, each user's preference is revealed in the form of choices and comparisons. For example, a user's purchase history tracks her choices, i.e. which item was chosen among a subset of offerings. A user's comparisons are observed either explicitly as in movie ratings or implicitly as in viewing times of news articles. Given such individualized ordinal data, we address the problem of collaboratively learning representations of the users and the items. The learned features can be used to predict a user's preference of an unseen item to be used in recommendation systems. This also allows one to compute similarities among users and items to be used for categorization and search. Motivated by the empirical successes of the MultiNomial Logit (MNL) model in marketing and transportation, and also more recent successes in word embedding and crowdsourced image embedding, we pose this problem as learning the MNL model parameters that best explains the data. We propose a convex optimization for learning the MNL model, and show that it is minimax optimal up to a logarithmic factor by comparing its performance to a fundamental lower bound. This characterizes the minimax sample complexity of the problem, and proves that the proposed estimator cannot be improved upon other than by a logarithmic factor. Further, the analysis identifies how the accuracy depends on the topology of sampling via the spectrum of the sampling graph. This provides a guideline for designing surveys when one can choose which items are to be compared. This is accompanies by numerical simulations on synthetic and real datasets confirming our theoretical predictions.
Finding Approximate Local Minima Faster than Gradient Descent
Agarwal, Naman, Allen-Zhu, Zeyuan, Bullins, Brian, Hazan, Elad, Ma, Tengyu
We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of training examples. The time complexity of our algorithm to find an approximate local minimum is even faster than that of gradient descent to find a critical point. Our algorithm applies to a general class of optimization problems including training a neural network and other non-convex objectives arising in machine learning.
Stationary signal processing on graphs
Perraudin, Nathanaël, Vandergheynst, Pierre
Graphs are a central tool in machine learning and information processing as they allow to conveniently capture the structure of complex datasets. In this context, it is of high importance to develop flexible models of signals defined over graphs or networks. In this paper, we generalize the traditional concept of wide sense stationarity to signals defined over the vertices of arbitrary weighted undirected graphs. We show that stationarity is expressed through the graph localization operator reminiscent of translation. We prove that stationary graph signals are characterized by a well-defined Power Spectral Density that can be efficiently estimated even for large graphs. We leverage this new concept to derive Wiener-type estimation procedures of noisy and partially observed signals and illustrate the performance of this new model for denoising and regression.
Estimating Quality in Multi-Objective Bandits Optimization
Durand, Audrey, Gagné, Christian
Many real-world applications are characterized by a number of conflicting performance measures. As optimizing in a multi-objective setting leads to a set of non-dominated solutions, a preference function is required for selecting the solution with the appropriate trade-off between the objectives. The question is: how good do estimations of these objectives have to be in order for the solution maximizing the preference function to remain unchanged? In this paper, we introduce the concept of preference radius to characterize the robustness of the preference function and provide guidelines for controlling the quality of estimations in the multi-objective setting. More specifically, we provide a general formulation of multi-objective optimization under the bandits setting. We show how the preference radius relates to the optimal gap and we use this concept to provide a theoretical analysis of the Thompson sampling algorithm from multivariate normal priors. We finally present experiments to support the theoretical results and highlight the fact that one cannot simply scalarize multi-objective problems into single-objective problems.
Fast Kronecker product kernel methods via generalized vec trick
Airola, Antti, Pahikkala, Tapio
Kronecker product kernel provides the standard approach in the kernel methods literature for learning from graph data, where edges are labeled and both start and end vertices have their own feature representations. The methods allow generalization to such new edges, whose start and end vertices do not appear in the training data, a setting known as zero-shot or zero-data learning. Such a setting occurs in numerous applications, including drug-target interaction prediction, collaborative filtering and information retrieval. Efficient training algorithms based on the so-called vec trick, that makes use of the special structure of the Kronecker product, are known for the case where the training data is a complete bipartite graph. In this work we generalize these results to non-complete training graphs. This allows us to derive a general framework for training Kronecker product kernel methods, as specific examples we implement Kronecker ridge regression and support vector machine algorithms. Experimental results demonstrate that the proposed approach leads to accurate models, while allowing order of magnitude improvements in training and prediction time.
On the choice of the low-dimensional domain for global optimization via random embeddings
Binois, Mickaël, Ginsbourger, David, Roustant, Olivier
The challenge of taking many variables into account in optimization problems may be overcome under the hypothesis of low effective dimensionality. Then, the search of solutions can be reduced to the random embedding of a low dimensional space into the original one, resulting in a more manageable optimization problem. Specifically, in the case of time consuming black-box functions and when the budget of evaluations is severely limited, global optimization with random embeddings appears as a sound alternative to random search. Yet, in the case of box constraints on the native variables, defining suitable bounds on a low dimensional domain appears to be complex. Indeed, a small search domain does not guarantee to find a solution even under restrictive hypotheses about the function, while a larger one may slow down convergence dramatically. Here we tackle the issue of low-dimensional domain selection based on a detailed study of the properties of the random embedding, giving insight on the aforementioned difficulties. In particular, we describe a minimal low-dimensional set in correspondence with the embedded search space. We additionally show that an alternative equivalent embedding procedure yields simultaneously a simpler definition of the low-dimensional minimal set and better properties in practice. Finally, the performance and robustness gains of the proposed enhancements for Bayesian optimization are illustrated on three examples.
Doubly Stochastic Primal-Dual Coordinate Method for Bilinear Saddle-Point Problem
Yu, Adams Wei, Lin, Qihang, Yang, Tianbao
We propose a doubly stochastic primal-dual coordinate optimization algorithm for empirical risk minimization, which can be formulated as a bilinear saddle-point problem. In each iteration, our method randomly samples a block of coordinates of the primal and dual solutions to update. The linear convergence of our method could be established in terms of 1) the distance from the current iterate to the optimal solution and 2) the primal-dual objective gap. We show that the proposed method has a lower overall complexity than existing coordinate methods when either the data matrix has a factorized structure or the proximal mapping on each block is computationally expensive, e.g., involving an eigenvalue decomposition. The efficiency of the proposed method is confirmed by empirical studies on several real applications, such as the multi-task large margin nearest neighbor problem.