AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator

Preiss, James A., Arnold, Sébastien M. R., Wei, Chen-Yu, Kloft, Marius

arXiv.org Machine LearningOct-2-2019

We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments.

algorithm, experiment, variance, (11 more...)

arXiv.org Machine Learning

1910.01249

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Forecasting Chaotic Systems with Very Low Connectivity Reservoir Computers

Griffith, Aaron, Pomerance, Andrew, Gauthier, Daniel J.

arXiv.org Machine LearningOct-1-2019

We explore the hyperparameter space of reservoir computers used for forecasting of the chaotic Lorenz '63 attractor with Bayesian optimization. We use a new measure of reservoir performance, designed to emphasize learning the global climate of the forecasted system rather than short-term prediction. We find that optimizing over this measure more quickly excludes reservoirs that fail to reproduce the climate. The results of optimization are surprising: the optimized parameters often specify a reservoir network with very low connectivity. Inspired by this observation, we explore reservoir designs with even simpler structure, and find well-performing reservoirs that have zero spectral radius and no recurrence. These simple reservoirs provide counterexamples to widely used heuristics in the field, and may be useful for hardware implementations of reservoir computers.

neural network, reservoir, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

1910.00659

Country:

North America > United States > Ohio (0.14)
North America > United States > Virginia (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Wang, Jianyu, Tantia, Vinayak, Ballas, Nicolas, Rabbat, Michael

arXiv.org Machine LearningOct-1-2019

A BSTRACT Distributed optimization is essential for training large models on large datasets. Multiple approaches have been proposed to reduce the communication overhead in distributed training, such as synchronizing only after performing multiple local SGD steps, and decentralized methods ( e.g., using gossip algorithms) to decouple communications among workers. Although these methods run faster than A LLR EDUCEbased methods, which use blocking communication before every update, the resulting models may be less accurate after the same number of updates. Inspired by the BMUF method of Chen & Huo (2016), we propose a slow momentum (S LOWM O) framework, where workers periodically synchronize and perform a momentum update, after multiple iterations of a base optimization algorithm. Experiments on image classification and machine translation tasks demonstrate that S LOWM O consistently yields improvements in optimization and generalization performance relative to the base optimizer, even when the additional overhead is amortized over many updates so that the S LOWM O runtime is on par with that of the base optimizer. We provide theoretical convergence guarantees showing that S LOWM O converges to a stationary point of smooth non-convex losses. Since BMUF is a particular instance of the S LOWM O framework, our results also correspond to the first theoretical convergence guarantees for BMUF. 1 I NTRODUCTION Distributed optimization (Chen et al., 2016; Goyal et al., 2017) is essential for training large models on large datasets (Radford et al., 2019; Liu et al., 2019; Mahajan et al., 2018b). Currently, the most widely-used approaches have workers compute small mini-batch gradients locally, in parallel, and then aggregate these using a blocking communication primitive, A LLR EDUCE, before taking an optimizer step. Communication overhead is a major issue limiting the scaling of this approach, since A LLR EDUCE must complete before every step and blocking communications are sensitive to stragglers (Dutta et al., 2018; Ferdinand et al., 2019). Multiple complementary approaches have recently been investigated to reduce or hide communication overhead. Decentralized training (Jiang et al., 2017; Lian et al., 2017; 2018; Assran et al., 2019) reduces idling due to blocking and stragglers by employing approximate gradient aggregation ( e.g., via gossip or distributed averaging). Approaches such as Local SGD reduce the frequency of communication by having workers perform multiple updates between each round of communication (McDonald et al., 2010; McMahan et al., 2017; Zhou & Cong, 2018; Stich, 2019; Y u et al., 2019b). It is also possible to combine decentralized algorithms with Local SGD (Wang & Joshi, Work performed while doing an internship at Facebook AI Research. 1 arXiv:1910.00643v1

algorithm, base algorithm, iteration, (14 more...)

arXiv.org Machine Learning

1910.00643

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

An Efficient and Margin-Approaching Zero-Confidence Adversarial Attack

Zhang, Yang, Chang, Shiyu, Yu, Mo, Qian, Kaizhi

arXiv.org Machine LearningOct-1-2019

There are two major paradigms of white-box adversarial attacks that attempt to impose input perturbations. The first paradigm, called the fix-perturbation attack, crafts adversarial samples within a given perturbation level. The second paradigm, called the zero-confidence attack, finds the smallest perturbation needed to cause mis-classification, also known as the margin of an input feature. While the former paradigm is well-resolved, the latter is not. Existing zero-confidence attacks either introduce significant ap-proximation errors, or are too time-consuming. We therefore propose MARGINATTACK, a zero-confidence attack framework that is able to compute the margin with improved accuracy and efficiency. Our experiments show that MARGINATTACK is able to compute a smaller margin than the state-of-the-art zero-confidence attacks, and matches the state-of-the-art fix-perturbation at-tacks. In addition, it runs significantly faster than the Carlini-Wagner attack, currently the most ac-curate zero-confidence attack algorithm.

argin, constraint, ttack, (14 more...)

arXiv.org Machine Learning

1910.00511

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)
Asia (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Protein Design by Provable Algorithms

Communications of the ACMSep-30-2019, 22:33:03 GMT

Proteins are a class of large molecules that are involved in the vast majority of biological functions, from cell replication to photosynthesis to cognition. The chemical structure of proteins is very systematic5--they consist of a chain of atoms known as the backbone, which consists of three-atom (nitrogen-carbon-carbon) repeats known as residues, each of which features a sidechain of atoms emanating from the first carbon. In general, there are 20 different options for sidechains, and a residue with a particular type of sidechain is known as an amino acid (so there are also 20 different amino acid types). For billions of years, the process of evolution has optimized the sequence of amino acids that make up naturally occurring proteins to suit the needs of the organisms that make them. So we ask: Can we use computation to design non-naturally occurring proteins that suit our biomedical and industrial needs? This question is a combinatorial optimization problem, because the output of a protein design computation is a sequence of amino acids. Due to the vast diversity of naturally occurring proteins, it is possible--and very useful--to begin a protein design computation with a naturally occurring protein and then to modify it to achieve the desired function. In this article, we focus on protein design algorithms that perform this optimization using detailed modeling of the 3D structure of the protein.5,8 Thus, they will begin with a starting structure, a 3D structure of a (typically naturally occurring) protein we wish to modify. To illustrate this concept, imagine we wish to perform a simple example modification to a protein to make it more stable, so it can still function at higher temperatures.

algorithm, conformation, protein, (14 more...)

Communications of the ACM

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Oklahoma > Payne County > Cushing (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)

Add feedback

Black-box Adversarial Attacks with Bayesian Optimization

Shukla, Satya Narayan, Sahu, Anit Kumar, Willmott, Devin, Kolter, J. Zico

arXiv.org Machine LearningSep-30-2019

October 1, 2019 Abstract We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization (BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high dimensional deep learning models by effective dimension upsampling techniques. Our proposed approach achieves performance comparable to the state of the art black-box adversarial attacks albeit with a much lower average query count. In particular, in low query budget regimes, our proposed method reduces the query count up to 80% with respect to the state of the art methods. 1 Introduction Neural networks are now well-known to be vulnerable to adversarial examples: additive perturbations that, when applied to the input, change the network's output classification [9]. Work investigating this lack of robustness to adversarial examples often takes the form of a back-and-forth between newly proposed adversarial attacks, methods for quickly and efficiently crafting adversarial examples, and corresponding defenses that modify the classifier at either training or test time to improve robustness. The most successful adversarial attacks use gradient-based optimization methods [9, 17], which require complete knowledge of the architecture and parameters of the target network; this assumption is referred to as the white-box attack setting.

bayesian optimization, optimization, perturbation, (12 more...)

arXiv.org Machine Learning

1909.13857

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(5 more...)

Genre: Research Report (0.84)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML

Liu, Sijia, Lu, Songtao, Chen, Xiangyi, Feng, Yao, Xu, Kaidi, Al-Dujaili, Abdullah, Hong, Minyi, Obelilly, Una-May

arXiv.org Machine LearningSep-30-2019

In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the later needs just one-step descent/ascent update. We show that the proposed framework, referred to as ZO-Min-Max, has a sub-linear convergence rate under mild conditions and scales gracefully with problem size. From an application side, we explore a promising connection between black-box min-max optimization and black-box evasion and poisoning attacks in adversarial machine learning (ML). Our empirical evaluations on these use cases demonstrate the effectiveness of our approach and its scalability to dimensions that prohibit using recent black-box solvers.

arxiv preprint arxiv, optimization, zo-min-max, (13 more...)

arXiv.org Machine Learning

1909.13806

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Minnesota (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Optimal Algorithms for Submodular Maximization with Distributed Constraints

Robey, Alexander, Adibi, Arman, Schlotfeldt, Brent, Pappas, George J., Hassani, Hamed

arXiv.org Machine LearningSep-30-2019

Optimal Algorithms for Submodular Maximization with Distributed Constraints Alexander Robey, Arman Adibi, Brent Schlotfeldt, George J. Pappas, and Hamed Hassani Abstract -- We consider a class of discrete optimization problems that aim to maximize a submodular objective function subject to a distributed partition matroid constraint. More precisely, we consider a networked scenario in which multiple agents choose actions from local strategy sets with the goal of maximizing a submodular objective function defined over the set of all possible actions. Given this distributed setting, we develop Constraint-Distributed Continuous Greedy ( CDCG), a message passing algorithm that converges to the tight (1 1 /e) approximation factor of the optimum global solution using only local computation and communication. It is known that a sequential greedy algorithm can only achieve a 1 /2 multiplicative approximation of the optimal solution for this class of problems in the distributed setting. Our framework relies on lifting the discrete problem to a continuous domain and developing a consensus algorithm that achieves the tight (1 1 /e) approximation guarantee of the global discrete solution once a proper rounding scheme is applied. We also offer empirical results from a multi-agent area coverage problem to show that the proposed method significantly outperforms the state-of-the-art sequential greedy method. I. INTRODUCTION Recently, the need has arisen to design algorithms that distribute decision making among a collection of agents or computing devices. This need has been motivated by problems from statistics, machine learning and robotics. These problems include: - (Density estimation) What is the best way to estimate a nonparametric density function from a distributed dataset? Inherent to all of these applications is an underlying optimization problem that can be expressed as maximize f (S) (1a) subject to S Y, S I (1b) where f is a submodular function (i.e. it has a diminishing-returns property), Y is a finite ground set of all decision variables, and I is a family of allowable subsets of Y .

agent, algorithm, approximation, (15 more...)

arXiv.org Machine Learning

1909.13676

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Q-Search Trees: An Information-Theoretic Approach Towards Hierarchical Abstractions for Agents with Computational Limitations

Larsson, Daniel T., Maity, Dipankar, Tsiotras, Panagiotis

arXiv.org Artificial IntelligenceSep-30-2019

In this paper, we develop a framework to obtain graph abstractions for decision-making by an agent where the abstractions emerge as a function of the agent's limited computational resources. We discuss the connection of the proposed approach with information-theoretic signal compression, and formulate a novel optimization problem to obtain tree-based abstractions as a function of the agent's computational resources. The structural properties of the new problem are discussed in detail, and two algorithmic approaches are proposed to obtain solutions to this optimization problem. We discuss the quality of, and prove relationships between, solutions obtained by the two proposed algorithms. The framework is demonstrated to generate a hierarchy of abstractions for a non-trivial environment.

algorithm, q-tree search algorithm, representation, (16 more...)

arXiv.org Artificial Intelligence

1910.00063

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(5 more...)

Genre:

Research Report (0.50)
Overview (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

Cowan, Wesley, Katehakis, Michael N., Pirutinsky, Daniel

arXiv.org Artificial IntelligenceSep-28-2019

In this paper we derive an efficient method for computing the indices associated with an asymptotically optimal upper confidence bound algorithm (MDP-UCB) of Burnetas and Katehakis (1997) that only requires solving a system of two non-linear equations with two unknowns, irrespective of the cardinality of the state space of the Markovian decision process (MDP). In addition, we develop a similar acceleration for computing the indices for the MDP-Deterministic Minimum Empirical Divergence (MDP-DMED) algorithm developed in Cowan et al. (2019), based on ideas from Honda and Takemura (2011), that involves solving a single equation of one variable. We provide experimental results demonstrating the computational time savings and regret performance of these algorithms. In these comparison we also consider the Optimistic Linear Programming (OLP) algorithm (Tewari and Bartlett, 2008) and a method based on Posterior sampling (MDP-PS).

algorithm, katehakis, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

1909.13158

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.74)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback