Goto

Collaborating Authors

 Optimization


Multiview Learning of Weighted Majority Vote by Bregman Divergence Minimization

arXiv.org Machine Learning

We tackle the issue of classifier combinations when observations have multiple views. Our method jointly learns view-specific weighted majority vote classifiers (i.e. for each view) over a set of base voters, and a second weighted majority vote classifier over the set of these view-specific weighted majority vote classifiers. We show that the empirical risk minimization of the final majority vote given a multiview training set can be cast as the minimization of Bregman divergences. This allows us to derive a parallel-update optimization algorithm for learning our multiview model. We empirically study our algorithm with a particular focus on the impact of the training set size on the multiview learning results. The experiments show that our approach is able to overcome the lack of labeled information.


Maximizing acquisition functions for Bayesian optimization

arXiv.org Machine Learning

Bayesian optimization is a sample-efficient approach to global optimization that relies on theoretically motivated value heuristics (acquisition functions) to guide the search process. Fully maximizing acquisition functions produces the Bayes' decision rule, but this ideal is difficult to achieve since these functions are frequently non-trivial to optimize. This statement is especially true when evaluating queries in parallel, where acquisition functions are routinely non-convex, high-dimensional, and intractable. We present two modern approaches for maximizing acquisition functions that exploit key properties thereof, namely the differentiability of Monte Carlo integration and the submodularity of parallel querying.


Towards More Efficient Stochastic Decentralized Learning: Faster Convergence and Sparse Communication

arXiv.org Machine Learning

Recently, the decentralized optimization problem is attracting growing attention. Most existing methods are deterministic with high per-iteration cost and have a convergence rate quadratically depending on the problem condition number. Besides, the dense communication is necessary to ensure the convergence even if the dataset is sparse. In this paper, we generalize the decentralized optimization problem to a monotone operator root finding problem, and propose a stochastic algorithm named DSBA that (i) converges geometrically with a rate linearly depending on the problem condition number, and (ii) can be implemented using sparse communication only. Additionally, DSBA handles learning problems like AUC-maximization which cannot be tackled efficiently in the decentralized setting. Experiments on convex minimization and AUC-maximization validate the efficiency of our method.


Primal-Dual Wasserstein GAN

arXiv.org Machine Learning

We introduce Primal-Dual Wasserstein GAN, a new learning algorithm for building latent variable models of the data distribution based on the primal and the dual formulations of the optimal transport (OT) problem. We utilize the primal formulation to learn a flexible inference mechanism and to create an optimal approximate coupling between the data distribution and the generative model. In order to learn the generative model, we use the dual formulation and train the decoder adversarially through a critic network that is regularized by the approximate coupling obtained from the primal. Unlike previous methods that violate various properties of the optimal critic, we regularize the norm and the direction of the gradients of the critic function. Our model shares many of the desirable properties of auto-encoding models in terms of mode coverage and latent structure, while avoiding their undesirable averaging properties, e.g. their inability to capture sharp visual features when modeling real images. We compare our algorithm with several other generative modeling techniques that utilize Wasserstein distances on Frechet Inception Distance (FID) and Inception Scores (IS).


Addressing the Item Cold-start Problem by Attribute-driven Active Learning

arXiv.org Machine Learning

In recommender systems, cold-start issues are situations where no previous events, e.g. ratings, are known for certain users or items. In this paper, we focus on the item cold-start problem. Both content information (e.g. item attributes) and initial user ratings are valuable for seizing users' preferences on a new item. However, previous methods for the item cold-start problem either 1) incorporate content information into collaborative filtering to perform hybrid recommendation, or 2) actively select users to rate the new item without considering content information and then do collaborative filtering. In this paper, we propose a novel recommendation scheme for the item cold-start problem by leverage both active learning and items' attribute information. Specifically, we design useful user selection criteria based on items' attributes and users' rating history, and combine the criteria in an optimization framework for selecting users. By exploiting the feedback ratings, users' previous ratings and items' attributes, we then generate accurate rating predictions for the other unselected users. Experimental results on two real-world datasets show the superiority of our proposed method over traditional methods.


Optimal Algorithms for Continuous Non-monotone Submodular and DR-Submodular Maximization

arXiv.org Machine Learning

In this paper we study the fundamental problems of maximizing a continuous non-monotone submodular function over the hypercube, both with and without coordinate-wise concavity. This family of optimization problems has several applications in machine learning, economics, and communication systems. Our main result is the first $\frac{1}{2}$-approximation algorithm for continuous submodular function maximization; this approximation factor of $\frac{1}{2}$ is the best possible for algorithms that only query the objective function at polynomially many points. For the special case of DR-submodular maximization, i.e. when the submodular functions is also coordinate wise concave along all coordinates, we provide a different $\frac{1}{2}$-approximation algorithm that runs in quasilinear time. Both of these results improve upon prior work [Bian et al, 2017, Soma and Yoshida, 2017]. Our first algorithm uses novel ideas such as reducing the guaranteed approximation problem to analyzing a zero-sum game for each coordinate, and incorporates the geometry of this zero-sum game to fix the value at this coordinate. Our second algorithm exploits coordinate-wise concavity to identify a monotone equilibrium condition sufficient for getting the required approximation guarantee, and hunts for the equilibrium point using binary search. We further run experiments to verify the performance of our proposed algorithms in related machine learning applications.


Simple and practical algorithms for $\ell_p$-norm low-rank approximation

arXiv.org Machine Learning

We propose practical algorithms for entrywise $\ell_p$-norm low-rank approximation, for $p = 1$ or $p = \infty$. The proposed framework, which is non-convex and gradient-based, is easy to implement and typically attains better approximations, faster, than state of the art. From a theoretical standpoint, we show that the proposed scheme can attain $(1 + \varepsilon)$-OPT approximations. Our algorithms are not hyperparameter-free: they achieve the desiderata only assuming algorithm's hyperparameters are known a priori---or are at least approximable. I.e., our theory indicates what problem quantities need to be known, in order to get a good solution within polynomial time, and does not contradict to recent inapproximabilty results, as in [46].


Interior Point Methods with Adversarial Networks

arXiv.org Machine Learning

We present a new methodology, called IPMAN, that combines interior point methods and generative adversarial networks to solve constrained optimization problems with feasible sets that are non-convex or not explicitly defined. Our methodology produces {\epsilon}-optimal solutions and demonstrates that, when there are multiple global optima, it learns a distribution over the optimal set. We apply our approach to synthetic examples to demonstrate its effectiveness and to a problem in radiation therapy treatment optimization with a non-convex feasible set.


Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Cooling system plays a key role in modern data center. Developing an optimal control policy for data center cooling system is a challenging task. The prevailing approaches often rely on approximated system models that are built upon the knowledge of mechanical cooling, electrical and thermal management, which is difficult to design and may lead to sub-optimal or unstable performances. In this paper we propose to utilize the large amount of monitoring data in data center to optimize the control policy. To do so, we cast the cooling control policy design into an energy cost minimization problem with temperature constraints, and tab it into the emerging deep reinforcement learning (DRL) framework. Specifically, we propose an end-to-end neural control algorithm that is based on the actor-critic framework and the deep deterministic policy gradient (DDPG) technique. To improve the robustness of the control algorithm, we test various DRL related optimization techniques, such as recurrent decision making, discounted return, different neural network architectures, and different stochastic gradient descent algorithms, and adding additional constraints on the output of the policy network. We evaluate the proposed algorithms on the EnergyPlus simulation platform and on a real data trace collected from the National Super Computing Centre (NSCC) of Singapore. Our results show that the proposed end-to-end cooling control algorithm can achieve about 10% cooling cost saving on the simulation platform compared with a canonical two stage optimization algorithm; and it can achieve about 13.6% cooling energy saving on the NSCC data trace. Furthermore, it shows high accuracy in predicting the temperature of the racks (with mean absolute error 0.1 degree) and can control the temperature of the data center zone close to the predefined threshold with variation lower to 0.2 degree.


Designing the Game to Play: Optimizing Payoff Structure in Security Games

arXiv.org Artificial Intelligence

Effective game-theoretic modeling of defender-attacker behavior is becoming increasingly important. In many domains, the defender functions not only as a player but also the designer of the game's payoff structure. We study Stackelberg Security Games where the defender, in addition to allocating defensive resources to protect targets from the attacker, can strategically manipulate the attacker's payoff under budget constraints in weighted L^p-norm form regarding the amount of change. Focusing on problems with weighted L^1-norm form constraint, we present (i) a mixed integer linear program-based algorithm with approximation guarantee; (ii) a branch-and-bound based algorithm with improved efficiency achieved by effective pruning; (iii) a polynomial time approximation scheme for a special but practical class of problems. In addition, we show that problems under budget constraints in L^0-norm form and weighted L^\infty-norm form can be solved in polynomial time. We provide an extensive experimental evaluation of our proposed algorithms.