Optimization
A Difference-of-Convex Programming Approach With Parallel Branch-and-Bound For Sentence Compression Via A Hybrid Extractive Model
Niu, Yi-Shuai, You, Yu, Xu, Wenxu, Ding, Wentao, Hu, Junpeng
Sentence compression is an important problem in natural language processing with wide applications in text summarization, search engine and human-AI interaction system etc. In this paper, we design a hybrid extractive sentence compression model combining a probability language model and a parse tree language model for compressing sentences by guaranteeing the syntax correctness of the compression results. Our compression model is formulated as an integer linear programming problem, which can be rewritten as a Difference-of-Convex (DC) programming problem based on the exact penalty technique. We use a well known efficient DC algorithm -- DCA to handle the penalized problem for local optimal solutions. Then a hybrid global optimization algorithm combining DCA with a parallel branch-and-bound framework, namely PDCABB, is used for finding global optimal solutions. Numerical results demonstrate that our sentence compression model can provide excellent compression results evaluated by F-score, and indicate that PDCABB is a promising algorithm for solving our sentence compression model.
Solving Billion-Scale Knapsack Problems
Zhang, Xingwen, Qi, Feng, Hua, Zhigang, Yang, Shuang
Knapsack problems (KPs) are common in industry, but solving KPs is known to be NP-hard and has been tractable only at a relatively small scale. This paper examines KPs in a slightly generalized form and shows that they can be solved nearly optimally at scale via distributed algorithms. The proposed approach can be implemented fairly easily with off-the-shelf distributed computing frameworks (e.g. MPI, Hadoop, Spark). As an example, our implementation leads to one of the most efficient KP solvers known to date -- capable to solve KPs at an unprecedented scale (e.g., KPs with 1 billion decision variables and 1 billion constraints can be solved within 1 hour). The system has been deployed to production and called on a daily basis, yielding significant business impacts at Ant Financial.
Choice Set Optimization Under Discrete Choice Models of Group Decisions
Tomlinson, Kiran, Benson, Austin R.
The way that people make choices or exhibit preferences can be strongly affected by the set of available alternatives, often called the choice set. Furthermore, there are usually heterogeneous preferences, either at an individual level within small groups or within sub-populations of large groups. Given the availability of choice data, there are now many models that capture this behavior in order to make effective predictions. However, there is little work in understanding how directly changing the choice set can be used to influence a group's preferences or decisions. Here, we use discrete choice modeling to develop an optimization framework of such interventions for several problems of group influence, including maximizing agreement or disagreement and promoting a particular choice. We show that these problems are NP-hard in general but imposing restrictions reveals a fundamental boundary: promoting an item is easier than maximizing agreement or disagreement. After, we design approximation algorithms for the hard problems and show that they work extremely well for real-world choice data.
Evolving Loss Functions With Multivariate Taylor Polynomial Parameterizations
Gonzalez, Santiago, Miikkulainen, Risto
Loss function optimization for neural networks has recently emerged as a new direction for metalearning, with Genetic Loss Optimization (GLO) providing a general approach for the discovery and optimization of such functions. GLO represents loss functions as trees that are evolved and further optimized using evolutionary strategies. However, searching in this space is difficult because most candidates are not valid loss functions. In this paper, a new technique, Multivariate Taylor expansion-based genetic loss-function optimization (TaylorGLO), is introduced to solve this problem. It represents functions using a novel parameterization based on Taylor expansions, making the search more effective. TaylorGLO is able to find new loss functions that outperform those found by GLO in many fewer generations, demonstrating that loss function optimization is a productive avenue for metalearning.
Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization
Letham, Benjamin, Calandra, Roberto, Rai, Akshara, Bakshy, Eytan
Bayesian optimization (BO) is a popular approach to optimize expensive-to-evaluate black-box functions. A significant challenge in BO is to scale to high-dimensional parameter spaces while retaining sample efficiency. A solution considered in existing literature is to embed the high-dimensional space in a lower-dimensional manifold, often via a random linear embedding. In this paper, we identify several crucial issues and misconceptions about the use of linear embeddings for BO. We study the properties of linear embeddings from the literature and show that some of the design choices in current approaches adversely impact their performance. We show empirically that properly addressing these issues significantly improves the efficacy of linear embeddings for BO on a range of problems, including learning a gait policy for robot locomotion.
On implicit regularization: Morse functions and applications to matrix factorization
In this paper, we revisit implicit regularization from the ground up using notions from dynamical systems and invariant subspaces of Morse functions. The key contributions are a new criterion for implicit regularization---a leading contender to explain the generalization power of deep models such as neural networks---and a general blueprint to study it. We apply these techniques to settle a conjecture on implicit regularization in matrix factorization.
SA vs SAA for population Wasserstein barycenter calculation
In Machine Learning and Optimization community there are two main approaches for convex risk minimization problem. The first approach is Stochastic Averaging (SA) (online) and the second one is Stochastic Average Approximation (SAA) (Monte Carlo, Empirical Risk Minimization, offline) with proper regularization in non-strongly convex case. At the moment, it is known that both approaches are on average equivalent (up to a logarithmic factor) in terms of oracle complexity (required number of stochastic gradient evaluations). What is the situation with total complexity? The answer depends on specific problem. However, starting from work [Nemirovski et al. (2009)] it was generally accepted that SA is better than SAA. Nevertheless, in case of large-scale problems SA may ran out of memory problems since storing all data on one machine and organizing online access to it can be impossible without communications with other machines. SAA in contradistinction to SA allows parallel/distributed calculations. In this paper we show that SAA may outperform SA in the problem of calculating an estimation for population ({\mu}-entropy regularized) Wasserstein barycenter even for non-parallel (non-decenralized) set up.
Preventing Imitation Learning with Adversarial Policy Ensembles
Zhan, Albert, Tiomkin, Stas, Abbeel, Pieter
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy. Policies, such as human, or policies on deployed robots, can all be cloned without consent from the owners. How can we protect against external observers cloning our proprietary policies? To answer this question we introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies, whose demonstrations are guaranteed to be useless for an external observer. We formulate this idea by a constrained optimization problem, where the objective is to improve proprietary policies, and at the same time deteriorate the virtual policy of an eventual external observer. We design a tractable algorithm to solve this new optimization problem by modifying the standard policy gradient algorithm. Our formulation can be interpreted in lenses of confidentiality and adversarial behaviour, which enables a broader perspective of this work. We demonstrate the existence of "non-clonable" ensembles, providing a solution to the above optimization problem, which is calculated by our modified policy gradient algorithm. To our knowledge, this is the first work regarding the protection of policies in Reinforcement Learning.
Faster Projection-free Online Learning
In many online learning problems the computational bottleneck for gradient-based methods is the projection operation. For this reason, in many problems the most efficient algorithms are based on the Frank-Wolfe method, which replaces projections by linear optimization. In the general case, however, online projection-free methods require more iterations than projection-based methods: the best known regret bound scales as $T^{3/4}$. Despite significant work on various variants of the Frank-Wolfe method, this bound has remained unchanged for a decade. In this paper we give an efficient projection-free algorithm that guarantees $T^{2/3}$ regret for general online convex optimization with smooth cost functions and one linear optimization computation per iteration. As opposed to previous Frank-Wolfe approaches, our algorithm is derived using the Follow-the-Perturbed-Leader method and is analyzed using an online primal-dual framework.
Multi-Participant Multi-Class Vertical Federated Learning
Federated learning (FL) is a privacy-preserving paradigm for training collective machine learning models with locally stored data from multiple participants. V ertical federated learning (VFL) deals with the case where participants sharing the same sample ID space but having different feature spaces, while label information is owned by one participant. Current studies of VFL only support two participants, and mostly focus on binary-class logistic regression problems. In this paper, we propose the Multi-participant Multi-class V ertical Federated Learning (MMVFL) framework for multi-class VFL problems involving multiple parties. Extending the idea of multi-view learning (MVL), MMVFL enables label sharing from its owner to other VFL participants in a privacy-preserving manner. To demonstrate the effectiveness of MMVFL, a feature selection scheme is incorporated into MMVFL to compare its performance against supervised feature selection and MVL-based approaches. Experiment results on real-world datasets show that MMVFL can effectively share label information among multiple VFL participants and match multi-class classification performance of existing approaches.