Goto

Collaborating Authors

 Optimization


Sensor Selection by Linear Programming

arXiv.org Machine Learning

We learn sensor trees from training data to minimize sensor acquisition costs during test time. Our system adaptively selects sensors at each stage if necessary to make a confident classification. We pose the problem as empirical risk minimization over the choice of trees and node decision rules. We decompose the problem, which is known to be intractable, into combinatorial (tree structures) and continuous parts (node decision rules) and propose to solve them separately. Using training data we greedily solve for the combinatorial tree structures and for the continuous part, which is a non-convex multilinear objective function, we derive convex surrogate loss functions that are piecewise linear. The resulting problem can be cast as a linear program and has the advantage of guaranteed convergence, global optimality, repeatability and computational efficiency. We show that our proposed approach outperforms the state-of-art on a number of benchmark datasets.


Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization

arXiv.org Machine Learning

We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convex-concave saddle point problem. We propose a stochastic primal-dual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variable. An extrapolation step on the primal variable is performed to obtain accelerated convergence rate. We also develop a mini-batch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several state-of-the-art optimization methods.


Theoretical and Experimental Analyses of Tensor-Based Regression and Classification

arXiv.org Machine Learning

We theoretically and experimentally investigate tensor-based regression and classification. Our focus is regularization with various tensor norms, including the overlapped trace norm, the latent trace norm, and the scaled latent trace norm. We first give dual optimization methods using the alternating direction method of multipliers, which is computationally efficient when the number of training samples is moderate. We then theoretically derive an excess risk bound for each tensor norm and clarify their behavior. Finally, we perform extensive experiments using simulated and real data and demonstrate the superiority of tensor-based learning methods over vector- and matrix-based learning methods.


On Transitive Consistency for Linear Invertible Transformations between Euclidean Coordinate Systems

arXiv.org Machine Learning

Transitive consistency is an intrinsic property for collections of linear invertible transformations between Euclidean coordinate frames. In practice, when the transformations are estimated from data, this property is lacking. This work addresses the problem of synchronizing transformations that are not transitively consistent. Once the transformations have been synchronized, they satisfy the transitive consistency condition - a transformation from frame $A$ to frame $C$ is equal to the composite transformation of first transforming A to B and then transforming B to C. The coordinate frames correspond to nodes in a graph and the transformations correspond to edges in the same graph. Two direct or centralized synchronization methods are presented for different graph topologies; the first one for quasi-strongly connected graphs, and the second one for connected graphs. As an extension of the second method, an iterative Gauss-Newton method is presented, which is later adapted to the case of affine and Euclidean transformations. Two distributed synchronization methods are also presented for orthogonal matrices, which can be seen as distributed versions of the two direct or centralized methods; they are similar in nature to standard consensus protocols used for distributed averaging. When the transformations are orthogonal matrices, a bound on the optimality gap can be computed. Simulations show that the gap is almost right, even for noise large in magnitude. This work also contributes on a theoretical level by providing linear algebraic relationships for transitively consistent transformations. One of the benefits of the proposed methods is their simplicity - basic linear algebraic methods are used, e.g., the Singular Value Decomposition (SVD). For a wide range of parameter settings, the methods are numerically validated.


Mechanisms for Multi-unit Combinatorial Auctions with a Few Distinct Goods

Journal of Artificial Intelligence Research

We design and analyze deterministic truthful approximation mechanisms for multi-unit Combinatorial Auctions involving only a constant number of distinct goods, each in arbitrary limited supply. Prospective buyers (bidders) have preferences over multisets of items, i.e., for more than one unit per distinct good. Our objective is to determine allocations of multisets that maximize the Social Welfare. Our main results are for multi-minded and submodular bidders. In the first setting each bidder has a positive value for being allocated one multiset from a prespecified demand set of alternatives. In the second setting each bidder is associated to a submodular valuation function that defines his value for the multiset he is allocated. For multi-minded bidders, we design a truthful FPTAS that fully optimizes the Social Welfare, while violating the supply constraints on goods within factor (1+e), for any fixed e>0 (i.e., the approximation applies to the constraints and not to the Social Welfare). This result is best possible, in that full optimization is impossible without violating the supply constraints. For submodular bidders, we obtain a PTAS that approximates the optimum Social Welfare within factor (1+e), for any fixed e>0, without violating the supply constraints. This result is best possible as well. Our allocation algorithms are Maximal-in-Range and yield truthful mechanisms, when paired with Vickrey-Clarke-Groves payments.


Model Guided Sampling Optimization for Low-dimensional Problems

arXiv.org Machine Learning

Optimization of very expensive black-box functions requires utilization of maximum information gathered by the process of optimization. Model Guided Sampling Optimization (MGSO) forms a more robust alternative to Jones' Gaussian-process-based EGO algorithm. Instead of EGO's maximizing expected improvement, the MGSO uses sampling the probability of improvement which is shown to be helpful against trapping in local minima. Further, the MGSO can reach close-to-optimum solutions faster than standard optimization algorithms on low dimensional or smooth problems.


Fast and Flexible ADMM Algorithms for Trend Filtering

arXiv.org Machine Learning

This paper presents a fast and robust algorithm for trend filtering, a recently developed nonparametric regression tool. It has been shown that, for estimating functions whose derivatives are of bounded variation, trend filtering achieves the minimax optimal error rate, while other popular methods like smoothing splines and kernels do not. Standing in the way of a more widespread practical adoption, however, is a lack of scalable and numerically stable algorithms for fitting trend filtering estimates. This paper presents a highly efficient, specialized ADMM routine for trend filtering. Our algorithm is competitive with the specialized interior point methods that are currently in use, and yet is far more numerically robust. Furthermore, the proposed ADMM implementation is very simple, and importantly, it is flexible enough to extend to many interesting related problems, such as sparse trend filtering and isotonic trend filtering. Software for our method is freely available, in both the C and R languages.


A Smoothed Dual Approach for Variational Wasserstein Problems

arXiv.org Machine Learning

Variational problems that involve Wasserstein distances have been recently proposed to summarize and learn from probability measures. Despite being conceptually simple, such problems are computationally challenging because they involve minimizing over quantities (Wasserstein distances) that are themselves hard to compute. We show that the dual formulation of Wasserstein variational problems introduced recently by Carlier et al. (2014) can be regularized using an entropic smoothing, which leads to smooth, differentiable, convex optimization problems that are simpler to implement and numerically more stable. We illustrate the versatility of this approach by applying it to the computation of Wasserstein barycenters and gradient flows of spacial regularization functionals.


On Monotonicity of the Optimal Transmission Policy in Cross-layer Adaptive m-QAM Modulation

arXiv.org Machine Learning

This paper considers a cross-layer adaptive modulation system that is modeled as a Markov decision process (MDP). We study how to utilize the monotonicity of the optimal transmission policy to relieve the computational complexity of dynamic programming (DP). In this system, a scheduler controls the bit rate of the m-quadrature amplitude modulation (m-QAM) in order to minimize the long-term losses incurred by the queue overflow in the data link layer and the transmission power consumption in the physical layer. The work is done in two steps. Firstly, we observe the L-natural-convexity and submodularity of DP to prove that the optimal policy is always nondecreasing in queue occupancy/state and derive the sufficient condition for it to be nondecreasing in both queue and channel states. We also show that, due to the L-natural-convexity of DP, the variation of the optimal policy in queue state is restricted by a bounded marginal effect: The increment of the optimal policy between adjacent queue states is no greater than one. Secondly, we use the monotonicity results to present two low complexity algorithms: monotonic policy iteration (MPI) based on L-natural-convexity and discrete simultaneous perturbation stochastic approximation (DSPSA). We run experiments to show that the time complexity of MPI based on L-natural-convexity is much lower than that of DP and the conventional MPI that is based on submodularity and DSPSA is able to adaptively track the optimal policy when the system parameters change.


A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

arXiv.org Machine Learning

Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using Singular Value Decomposition (SVD), may incur loss of corpus information. In addition, it is desirable to incorporate global latent factors, such as topics, sentiments or writing styles, into the word embedding model. Since generative models provide a principled way to incorporate latent factors, we propose a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models. The model inference reduces to a low rank weighted positive semidefinite approximation problem. Its optimization is approached by eigendecomposition on a submatrix, followed by online blockwise regression, which is scalable and avoids the information loss in SVD. In experiments on 7 common benchmark datasets, our vectors are competitive to word2vec, and better than other MF-based methods.