Optimization
Optimal Web-Scale Tiering as a Flow Problem
Leung, Gilbert, Quadrianto, Novi, Tsioutsiouliklis, Kostas, Smola, Alex J.
We present a fast online solver for large scale maximum-flow problems as they occur in portfolio optimization, inventory management, computer vision, and logistics. Our algorithm solves an integer linear program in an online fashion. It exploits total unimodularity of the constraint matrix and a Lagrangian relaxation to solve the problem as a convex online game. The algorithm generates approximate solutions of max-flow problems by performing stochastic gradient descent on a set of flows. We apply the algorithm to optimize tier arrangement of over 80 Million web pages on a layered set of caches to serve an incoming query stream optimally. We provide an empirical demonstration of the effectiveness of our method on real query-pages data.
Implicit Differentiation by Perturbation
This paper proposes a simple and efficient finite difference method for implicit differentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, defined on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to fit graphical models with hidden variables, high treewidth and/or model misspecification.
Policy gradients in linearly-solvable MDPs
We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy gradients for continuous-time stochastic systems.
Practical Large-Scale Optimization for Max-norm Regularization
Lee, Jason D., Recht, Ben, Srebro, Nathan, Tropp, Joel, Salakhutdinov, Ruslan R.
The max-norm was proposed as a convex matrix regularizer by Srebro et al (2004) and was shown to be empirically superior to the trace-norm for collaborative filtering problems. Although the max-norm can be computed in polynomial time, there are currently no practical algorithms for solving large-scale optimization problems that incorporate the max-norm. The present work uses a factorization technique of Burer and Monteiro (2003) to devise scalable first-order algorithms for convex programs involving the max-norm. These algorithms are applied to solve huge collaborative filtering, graph cut, and clustering problems. Empirically, the new methods outperform mature techniques from all three areas.
Efficient Minimization of Decomposable Submodular Functions
Stobbe, Peter, Krause, Andreas
Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for practical problems. In this paper, we introduce a novel subclass of submodular minimization problems that we call decomposable. Decomposable submodular functions are those that can be represented as sums of concave functions applied to linear functions. We develop an algorithm, SLG, that can efficiently minimize decomposable submodular functions with tens of thousands of variables. Our algorithm exploits recent results in smoothed convex minimization. We apply SLG to synthetic benchmarks and a joint classification-and-segmentation task, and show that it outperforms the state-of-the-art general purpose submodular minimization algorithms by several orders of magnitude.
A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
Mosci, Sofia, Villa, Silvia, Verri, Alessandro, Rosasco, Lorenzo
We deal with the problem of variable selection when variables must be selected group-wise, with possibly overlapping groups defined a priori. In particular we propose a new optimization procedure for solving the regularized algorithm presented in Jacob et al. 09, where the group lasso penalty is generalized to overlapping groups of variables. While in Jacob et al. 09 the proposed implementation requires explicit replication of the variables belonging to more than one group, our iterative procedure is based on a combination of proximal methods in the primal space and constrained Newton method in a reduced dual space, corresponding to the active groups. This procedure provides a scalable alternative with no need for data duplication, and allows to deal with high dimensional problems without pre-processing to reduce the dimensionality of the data. The computational advantages of our scheme with respect to state-of-the-art algorithms using data duplication are shown empirically with numerical simulations.
Relaxed Clipping: A Global Training Method for Robust Regression and Classification
Yang, Min, Xu, Linli, White, Martha, Schuurmans, Dale, Yu, Yao-liang
Robust regression and classification are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of ``loss clipping'' can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.
A unified model of short-range and long-range motion perception
Wu, Shuang, He, Xuming, Lu, Hongjing, Yuille, Alan L.
The human vision system is able to effortlessly perceive both short-range and long-range motion patterns in complex dynamic scenes. Previous work has assumed that two different mechanisms are involved in processing these two types of motion. In this paper, we propose a hierarchical model as a unified framework for modeling both short-range and long-range motion perception. Our model consists of two key components: a data likelihood that proposes multiple motion hypotheses using nonlinear matching, and a hierarchical prior that imposes slowness and spatial smoothness constraints on the motion field at multiple scales. We tested our model on two types of stimuli, random dot kinematograms and multiple-aperture stimuli, both commonly used in human vision research. We demonstrate that the hierarchical model adequately accounts for human performance in psychophysical experiments.
Worst-case bounds on the quality of max-product fixed-points
Vinyals, Meritxell, Cerquides, Jes\', us, Farinelli, Alessandro, Rodríguez-aguilar, Juan A.
We study worst-case bounds on the quality of any fixed point assignment of the max-product algorithm for Markov Random Fields (MRF). We start proving a bound independent of the MRF structure and parameters. Afterwards, we show how this bound can be improved for MRFs with particular structures such as bipartite graphs or grids. Our results provide interesting insight into the behavior of max-product. For example, we prove that max-product provides very good results (at least 90% of the optimal) on MRFs with large variable-disjoint cycles (MRFs in which all cycles are variable-disjoint, namely that they do not share any edge and in which each cycle contains at least 20 variables).
Sparse Inverse Covariance Selection via Alternating Linearization Methods
Scheinberg, Katya, Ma, Shiqian, Goldfarb, Donald
Gaussian graphical models are of great interest in statistical learning. Because the conditional independencies between different nodes correspond to zero entries in the inverse covariance matrix of the Gaussian distribution, one can learn the structure of the graph by estimating a sparse inverse covariance matrix from sample data, by solving a convex maximum likelihood problem with an $\ell_1$-regularization term. In this paper, we propose a first-order method based on an alternating linearization technique that exploits the problem's special structure; in particular, the subproblems solved in each iteration have closed-form solutions. Moreover, our algorithm obtains an $\epsilon$-optimal solution in $O(1/\epsilon)$ iterations. Numerical experiments on both synthetic and real data from gene association networks show that a practical version of this algorithm outperforms other competitive algorithms.