AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

A Difference-of-Convex Programming Approach With Parallel Branch-and-Bound For Sentence Compression Via A Hybrid Extractive Model

Niu, Yi-Shuai, You, Yu, Xu, Wenxu, Ding, Wentao, Hu, Junpeng

arXiv.org Artificial IntelligenceFeb-2-2020

Sentence compression is an important problem in natural language processing with wide applications in text summarization, search engine and human-AI interaction system etc. In this paper, we design a hybrid extractive sentence compression model combining a probability language model and a parse tree language model for compressing sentences by guaranteeing the syntax correctness of the compression results. Our compression model is formulated as an integer linear programming problem, which can be rewritten as a Difference-of-Convex (DC) programming problem based on the exact penalty technique. We use a well known efficient DC algorithm -- DCA to handle the penalized problem for local optimal solutions. Then a hybrid global optimization algorithm combining DCA with a parallel branch-and-bound framework, namely PDCABB, is used for finding global optimal solutions. Numerical results demonstrate that our sentence compression model can provide excellent compression results evaluated by F-score, and indicate that PDCABB is a promising algorithm for solving our sentence compression model.

artificial intelligence, natural language, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2002.01352

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Vietnam (0.04)
Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)
Europe > France > Normandy > Seine-Maritime > Rouen (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Solving Billion-Scale Knapsack Problems

Zhang, Xingwen, Qi, Feng, Hua, Zhigang, Yang, Shuang

arXiv.org Artificial IntelligenceFeb-2-2020

Knapsack problems (KPs) are common in industry, but solving KPs is known to be NP-hard and has been tractable only at a relatively small scale. This paper examines KPs in a slightly generalized form and shows that they can be solved nearly optimally at scale via distributed algorithms. The proposed approach can be implemented fairly easily with off-the-shelf distributed computing frameworks (e.g. MPI, Hadoop, Spark). As an example, our implementation leads to one of the most efficient KP solvers known to date -- capable to solve KPs at an unprecedented scale (e.g., KPs with 1 billion decision variables and 1 billion constraints can be solved within 1 hour). The system has been deployed to production and called on a daily basis, yielding significant business impacts at Ant Financial.

algorithm, constraint, local constraint, (14 more...)

arXiv.org Artificial Intelligence

2002.00352

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Taiwan > Taiwan Province > Taipei (0.05)
(7 more...)

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Choice Set Optimization Under Discrete Choice Models of Group Decisions

Tomlinson, Kiran, Benson, Austin R.

arXiv.org Machine LearningFeb-2-2020

The way that people make choices or exhibit preferences can be strongly affected by the set of available alternatives, often called the choice set. Furthermore, there are usually heterogeneous preferences, either at an individual level within small groups or within sub-populations of large groups. Given the availability of choice data, there are now many models that capture this behavior in order to make effective predictions. However, there is little work in understanding how directly changing the choice set can be used to influence a group's preferences or decisions. Here, we use discrete choice modeling to develop an optimization framework of such interventions for several problems of group influence, including maximizing agreement or disagreement and promoting a particular choice. We show that these problems are NP-hard in general but imposing restrictions reveals a fundamental boundary: promoting an item is easier than maximizing agreement or disagreement. After, we design approximation algorithms for the hard problems and show that they work extremely well for real-world choice data.

algorithm 1, disagreement, promotion, (14 more...)

arXiv.org Machine Learning

2002.00421

Country:

Asia > Russia (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Russia (0.04)

Genre: Research Report (0.82)

Industry:

Government (0.93)
Information Technology (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Evolving Loss Functions With Multivariate Taylor Polynomial Parameterizations

Gonzalez, Santiago, Miikkulainen, Risto

arXiv.org Machine LearningJan-31-2020

Loss function optimization for neural networks has recently emerged as a new direction for metalearning, with Genetic Loss Optimization (GLO) providing a general approach for the discovery and optimization of such functions. GLO represents loss functions as trees that are evolved and further optimized using evolutionary strategies. However, searching in this space is difficult because most candidates are not valid loss functions. In this paper, a new technique, Multivariate Taylor expansion-based genetic loss-function optimization (TaylorGLO), is introduced to solve this problem. It represents functions using a novel parameterization based on Taylor expansions, making the search more effective. TaylorGLO is able to find new loss functions that outperform those found by GLO in many fewer generations, demonstrating that loss function optimization is a productive avenue for metalearning.

cross-entropy loss, loss function, taylorglo, (16 more...)

arXiv.org Machine Learning

2002.00059

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(6 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization

Letham, Benjamin, Calandra, Roberto, Rai, Akshara, Bakshy, Eytan

arXiv.org Machine LearningJan-31-2020

Bayesian optimization (BO) is a popular approach to optimize expensive-to-evaluate black-box functions. A significant challenge in BO is to scale to high-dimensional parameter spaces while retaining sample efficiency. A solution considered in existing literature is to embed the high-dimensional space in a lower-dimensional manifold, often via a random linear embedding. In this paper, we identify several crucial issues and misconceptions about the use of linear embeddings for BO. We study the properties of linear embeddings from the literature and show that some of the design choices in current approaches adversely impact their performance. We show empirically that properly addressing these issues significantly improves the efficacy of linear embeddings for BO on a range of problems, including learning a gait policy for robot locomotion.

kernel, optimization, probability, (14 more...)

arXiv.org Machine Learning

2001.11659

Country: North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.66)

Add feedback

On implicit regularization: Morse functions and applications to matrix factorization

Belabbas, Mohamed Ali

arXiv.org Machine LearningJan-31-2020

In this paper, we revisit implicit regularization from the ground up using notions from dynamical systems and invariant subspaces of Morse functions. The key contributions are a new criterion for implicit regularization---a leading contender to explain the generalization power of deep models such as neural networks---and a general blueprint to study it. We apply these techniques to settle a conjecture on implicit regularization in matrix factorization.

implicit regularization, matrix, regularization problem, (14 more...)

arXiv.org Machine Learning

2001.04264

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

SA vs SAA for population Wasserstein barycenter calculation

Dvinskikh, Darina

arXiv.org Machine LearningJan-30-2020

In Machine Learning and Optimization community there are two main approaches for convex risk minimization problem. The first approach is Stochastic Averaging (SA) (online) and the second one is Stochastic Average Approximation (SAA) (Monte Carlo, Empirical Risk Minimization, offline) with proper regularization in non-strongly convex case. At the moment, it is known that both approaches are on average equivalent (up to a logarithmic factor) in terms of oracle complexity (required number of stochastic gradient evaluations). What is the situation with total complexity? The answer depends on specific problem. However, starting from work [Nemirovski et al. (2009)] it was generally accepted that SA is better than SAA. Nevertheless, in case of large-scale problems SA may ran out of memory problems since storing all data on one machine and organizing online access to it can be impossible without communications with other machines. SAA in contradistinction to SA allows parallel/distributed calculations. In this paper we show that SAA may outperform SA in the problem of calculating an estimation for population ({\mu}-entropy regularized) Wasserstein barycenter even for non-parallel (non-decenralized) set up.

artificial intelligence, complexity, optimization problem, (18 more...)

arXiv.org Machine Learning

2001.07697

Country:

Europe > Russia (0.14)
Asia > Middle East > Jordan (0.14)
Oceania > Australia (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Preventing Imitation Learning with Adversarial Policy Ensembles

Zhan, Albert, Tiomkin, Stas, Abbeel, Pieter

arXiv.org Machine LearningJan-30-2020

Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy. Policies, such as human, or policies on deployed robots, can all be cloned without consent from the owners. How can we protect against external observers cloning our proprietary policies? To answer this question we introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies, whose demonstrations are guaranteed to be useless for an external observer. We formulate this idea by a constrained optimization problem, where the objective is to improve proprietary policies, and at the same time deteriorate the virtual policy of an eventual external observer. We design a tractable algorithm to solve this new optimization problem by modifying the standard policy gradient algorithm. Our formulation can be interpreted in lenses of confidentiality and adversarial behaviour, which enables a broader perspective of this work. We demonstrate the existence of "non-clonable" ensembles, providing a solution to the above optimization problem, which is calculated by our modified policy gradient algorithm. To our knowledge, this is the first work regarding the protection of policies in Reinforcement Learning.

arxiv, corr, observer, (14 more...)

arXiv.org Machine Learning

2002.01059

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Faster Projection-free Online Learning

Hazan, Elad, Minasyan, Edgar

arXiv.org Machine LearningJan-30-2020

In many online learning problems the computational bottleneck for gradient-based methods is the projection operation. For this reason, in many problems the most efficient algorithms are based on the Frank-Wolfe method, which replaces projections by linear optimization. In the general case, however, online projection-free methods require more iterations than projection-based methods: the best known regret bound scales as $T^{3/4}$. Despite significant work on various variants of the Frank-Wolfe method, this bound has remained unchanged for a decade. In this paper we give an efficient projection-free algorithm that guarantees $T^{2/3}$ regret for general online convex optimization with smooth cost functions and one linear optimization computation per iteration. As opposed to previous Frank-Wolfe approaches, our algorithm is derived using the Follow-the-Perturbed-Leader method and is analyzed using an online primal-dual framework.

algorithm, loss function, optimization, (16 more...)

arXiv.org Machine Learning

2001.11568

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Multi-Participant Multi-Class Vertical Federated Learning

Feng, Siwei, Yu, Han

arXiv.org Machine LearningJan-29-2020

Federated learning (FL) is a privacy-preserving paradigm for training collective machine learning models with locally stored data from multiple participants. V ertical federated learning (VFL) deals with the case where participants sharing the same sample ID space but having different feature spaces, while label information is owned by one participant. Current studies of VFL only support two participants, and mostly focus on binary-class logistic regression problems. In this paper, we propose the Multi-participant Multi-class V ertical Federated Learning (MMVFL) framework for multi-class VFL problems involving multiple parties. Extending the idea of multi-view learning (MVL), MMVFL enables label sharing from its owner to other VFL participants in a privacy-preserving manner. To demonstrate the effectiveness of MMVFL, a feature selection scheme is incorporated into MMVFL to compare its performance against supervised feature selection and MVL-based approaches. Experiment results on real-world datasets show that MMVFL can effectively share label information among multiple VFL participants and match multi-class classification performance of existing approaches.

label information, mmvfl, participant, (10 more...)

arXiv.org Machine Learning

2001.11154

Country:

Asia > Singapore (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback