AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

A Smoother Way to Train Structured Prediction Models

Pillutla, Krishna, Roulet, Vincent, Kakade, Sham M., Harchaoui, Zaid

arXiv.org Machine LearningFeb-8-2019

We present a framework to train a structured prediction model by performing smoothing on the inference algorithm it builds upon. Smoothing overcomes the non-smoothness inherent to the maximum margin structured prediction objective, and paves the way for the use of fast primal gradient-based optimization algorithms. We illustrate the proposed framework by developing a novel primal incremental optimization algorithm for the structural support vector machine. The proposed algorithm blends an extrapolation scheme for acceleration and an adaptive smoothing scheme and builds upon the stochastic variance-reduced gradient algorithm. We establish its worst-case global complexity bound and study several practical variants, including extensions to deep structured prediction. We present experimental results on two real-world problems, namely named entity recognition and visual object localization. The experimental results show that the proposed framework allows us to build upon efficient inference algorithms to develop large-scale optimization algorithms for structured prediction which can achieve competitive performance on the two real-world problems.

algorithm, optimization algorithm, oracle call, (11 more...)

arXiv.org Machine Learning

1902.03228

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Mode Collapse and Regularity of Optimal Transportation Maps

lei, Na, Guo, Yang, An, Dongsheng, Qi, Xin, Luo, Zhongxuan, Yau, Shing-Tung, Gu, Xianfeng

arXiv.org Machine LearningFeb-7-2019

This work builds the connection between the regularity theory of optimal transportation map, Monge-Amp\`{e}re equation and GANs, which gives a theoretic understanding of the major drawbacks of GANs: convergence difficulty and mode collapse. According to the regularity theory of Monge-Amp\`{e}re equation, if the support of the target measure is disconnected or just non-convex, the optimal transportation mapping is discontinuous. General DNNs can only approximate continuous mappings. This intrinsic conflict leads to the convergence difficulty and mode collapse in GANs. We test our hypothesis that the supports of real data distribution are in general non-convex, therefore the discontinuity is unavoidable using an Autoencoder combined with discrete optimal transportation map (AE-OT framework) on the CelebA data set. The testing result is positive. Furthermore, we propose to approximate the continuous Brenier potential directly based on discrete Brenier theory to tackle mode collapse. Comparing with existing method, this method is more accurate and effective.

mode collapse, optimal transportation map, transportation map, (13 more...)

arXiv.org Machine Learning

1902.02934

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Cost-Effective Incentive Allocation via Structured Counterfactual Inference

Lopez, Romain, Li, Chenchen, Yan, Xiang, Xiong, Junwu, Jordan, Michael I., Qi, Yuan, Song, Le

arXiv.org Machine LearningFeb-7-2019

We address a practical problem ubiquitous in modern industry, in which a mediator tries to learn a policy for allocating strategic financial incentives for customers in a marketing campaign and observes only bandit feedback. In contrast to traditional policy optimization frameworks, we rely on a specific assumption for the reward structure and we incorporate budget constraints. We develop a new two-step method for solving this constrained counterfactual policy optimization problem. First, we cast the reward estimation problem as a domain adaptation problem with supplementary structure. Subsequently, the estimators are used for optimizing the policy with constraints. We establish theoretical error bounds for our estimation procedure and we empirically show that the approach leads to significant improvement on both synthetic and real datasets.

bandit feedback, cost-effective incentive allocation, international conference, (13 more...)

arXiv.org Machine Learning

1902.02495

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.46)

Industry:

Education (0.46)
Marketing (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Compatible Natural Gradient Policy Search

Pajarinen, Joni, Thai, Hong Linh, Akrour, Riad, Peters, Jan, Neumann, Gerhard

arXiv.org Machine LearningFeb-7-2019

Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks.

approximation, gradient, natural gradient, (15 more...)

arXiv.org Machine Learning

1902.02823

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Lincolnshire > Lincoln (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

Hazimeh, Hussein, Mazumder, Rahul

arXiv.org Machine LearningFeb-6-2019

In many learning settings, it is beneficial to augment the main features with pairwise interactions. Such interaction models can be often enhanced by performing variable selection under the so-called strong hierarchy constraint: an interaction is non-zero only if its associated main features are non-zero. Existing convex optimization based algorithms face difficulties in handling problems where the number of main features $p \sim 10^3$ (with total number of features $\sim p^2$). In this paper, we study a convex relaxation which enforces strong hierarchy and develop a scalable algorithm for solving it. Our proposed algorithm employs a proximal gradient method along with a novel active-set strategy, specialized screening rules, and decomposition rules towards verifying optimality conditions. Our framework can handle problems having dense design matrices, with $p = 50,000$ ($\sim 10^9$ interactions)---instances that are much larger than current state of the art. Experiments on real and synthetic data suggest that our toolkit hierScale outperforms the state of the art in terms of prediction and variable selection and can achieve over a 1000x speed-up.

algorithm, optimal solution, optimality condition, (14 more...)

arXiv.org Machine Learning

1902.01542

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Convergence Analysis of Nonlinearly Constrained ADMM in Deep Learning

Zeng, Jinshan, Lin, Shao-Bo, Yao, Yuan

arXiv.org Machine LearningFeb-6-2019

Efficient training of deep neural networks (DNNs) is a challenge due to the associated highly nonconvex optimization. The alternating direction method of multipliers (ADMM) has attracted rising attention in deep learning for its potential of distributed computing. However, it remains an open problem to establish the convergence of ADMM in DNN training due to the nonlinear constraints involved. In this paper, we provide an answer to this problem by establishing the convergence of some nonlinearly constrained ADMM for DNNs with smooth activations. To be specific, we establish the global convergence to a Karush-Kuhn-Tucker (KKT) point at a ${\cal O}(1/k)$ rate. To achieve this goal, the key development lies in a new local linear approximation technique which enables us to overcome the hurdle of nonlinear constraints in ADMM for DNNs.

admm, convergence, wj 1, (12 more...)

arXiv.org Machine Learning

1902.0206

Country:

Asia > China > Hong Kong > Kowloon (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Jiangxi Province > Nanchang (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Linear Inequality Constraints for Neural Network Activations

Frerix, Thomas, Nießner, Matthias, Cremers, Daniel

arXiv.org Machine LearningFeb-6-2019

We propose a method to impose linear inequality constraints on neural network activations. The proposed method allows a data-driven training approach to be combined with modeling prior knowledge about the task. Our algorithm computes a suitable parameterization of the feasible set at initialization and uses standard variants of stochastic gradient descent to find solutions to the constrained network. Thus, the modeling constraints are always satisfied during training. Crucially, our approach avoids to solve a sub-optimization problem at each training step or to manually trade-off data and constraint fidelity with additional hyperparameters. We consider constrained generative modeling as an important application domain and experimentally demonstrate the proposed method by constraining a variational autoencoder.

algorithm, constraint, linear inequality constraint, (9 more...)

arXiv.org Machine Learning

1902.01785

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > France > Brittany > Finistère > Brest (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robust Regression via Online Feature Selection under Adversarial Data Corruption

Zhang, Xuchao, Lei, Shuo, Zhao, Liang, Boedihardjo, Arnold P., Lu, Chang-Tien

arXiv.org Machine LearningFeb-5-2019

The presence of data corruption in user-generated streaming data, such as social media, motivates a new fundamental problem that learns reliable regression coefficient when features are not accessible entirely at one time. Until now, several important challenges still cannot be handled concurrently: 1) corrupted data estimation when only partial features are accessible; 2) online feature selection when data contains adversarial corruption; and 3) scaling to a massive dataset. This paper proposes a novel RObust regression algorithm via Online Feature Selection (\textit{RoOFS}) that concurrently addresses all the above challenges. Specifically, the algorithm iteratively updates the regression coefficients and the uncorrupted set via a robust online feature substitution method. We also prove that our algorithm has a restricted error bound compared to the optimal solution. Extensive empirical experiments in both synthetic and real-world datasets demonstrated that the effectiveness of our new method is superior to that of existing methods in the recovery of both feature selection and regression coefficients, with very competitive efficiency.

algorithm, corruption, feature selection, (12 more...)

arXiv.org Machine Learning

1902.01729

Country:

North America > United States > Virginia > Falls Church (0.04)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
North America > United States > Virginia > Alexandria County > Alexandria (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Conditioning by adaptive sampling for robust design

Brookes, David H., Park, Hahnbeom, Listgarten, Jennifer

arXiv.org Machine LearningFeb-5-2019

We present a new method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest. For example, in protein design, one may wish to find the protein sequence that maximizes fluorescence. We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e.g., protein sequences) design space to a distribution over a property of interest (e.g. protein fluorescence). At first glance, this problem can be framed as one of optimizing the oracle(s) with respect to the input. However, many state-of-the-art predictive models, such as neural networks, are known to suffer from pathologies, especially for data far from the training distribution. Thus we need to modulate the optimization of the oracle inputs with prior knowledge about what makes `realistic' inputs (e.g., proteins that stably fold). Herein, we propose a new method to solve this problem, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches. Formally, our method achieves its success by using model-based adaptive sampling to estimate the conditional distribution of the input sequences given the desired properties.

conditioning, oracle, sequence, (15 more...)

arXiv.org Machine Learning

1901.1006

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Modeling & Simulation (0.86)
(2 more...)

Add feedback

How It Feels to Learn Data Science in 2019 – Towards Data Science

#artificialintelligenceFeb-4-2019, 09:58:26 GMT

So I just have to buy a Tableau license and I'm now a data scientist? Okay, let's just take that sales pitch with a grain of salt. I may be clueless, but I know there is more to data science than making pretty visualizations. I can even do that in Excel. You got to admit it is slick marketing though. Charting data is the fun stage, and they leave out the painful and time-consuming parts of working with data: cleaning, wrangling, transforming, and loading it. God help you if you need to write a specialized algorithm with your own domain logic when using closed tools. Yes, and that is why I suspect there is value in learning to code. Maybe you can learn Alteryx.

artificial intelligence, machine learning, natural language, (14 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
(2 more...)

Add feedback