Goto

Collaborating Authors

 Optimization


Towards a Theory of Systems Engineering Processes: A Principal-Agent Model of a One-Shot, Shallow Process

arXiv.org Artificial Intelligence

Systems engineering processes coordinate the effort of different individuals to generate a product satisfying certain requirements. As the involved engineers are self-interested agents, the goals at different levels of the systems engineering hierarchy may deviate from the system-level goals which may cause budget and schedule overruns. Therefore, there is a need of a systems engineering theory that accounts for the human behavior in systems design. To this end, the objective of this paper is to develop and analyze a principal-agent model of a one-shot (single iteration), shallow (one level of hierarchy) systems engineering process. We assume that the systems engineer maximizes the expected utility of the system, while the subsystem engineers seek to maximize their expected utilities. Furthermore, the systems engineer is unable to monitor the effort of the subsystem engineer and may not have a complete information about their types or the complexity of the design task. However, the systems engineer can incentivize the subsystem engineers by proposing specific contracts. To obtain an optimal incentive, we pose and solve numerically a bi-level optimization problem. Through extensive simulations, we study the optimal incentives arising from different system-level value functions under various combinations of effort costs, problem-solving skills, and task complexities.


Optuna: An Automatic Hyperparameter Optimization Framework Open Data Science Conference

#artificialintelligence

Preferred Networks has released a beta version of an open-source, automatic hyperparameter optimization framework called Optuna. In this blog, we will introduce the motivation behind the development of Optuna as well as its features. A hyperparameter is a parameter to control how a machine learning algorithm behaves. In deep learning, the learning rate, batch size, and number of training iterations are hyperparameters. Hyperparameters also include the numbers of neural network layers and channels.


IPO: Interior-point Policy Optimization under Constraints

arXiv.org Machine Learning

In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multi-constraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction. Introduction Recent advances have demonstrated significant potentials of deep reinforcement learning (RL) in solving complex sequential decision and control problems, e.g., the Atari game (Mnih et al. 2015), robotics (Andrychowicz et al. 2018), Go (Silver et al. 2016), etc. In such RL problems, the objective is to maximize the discounted cumulative reward. In many other problems, in addition to maximizing the reward, a policy needs to satisfy certain constraints.


Adaptive gradient descent without descent

arXiv.org Machine Learning

Yura Malitsky Konstantin Mishchenko โ€  Abstract We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don't increase the stepsize too fast and 2) don't overstep the local curvature. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on smoothness in a neighborhood of a solution. Given that the problem is convex, our method will converge even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including matrix factorization and training of ResNet-18. 1 Introduction Since the early days of optimization it was evident that there is a need for algorithms that are as independent from the user as possible. First-order methods have proven to be versatile and efficient in a wide range of applications, but one drawback has been present all that time: the stepsize. Despite some certain success stories, line search procedures and adaptive online methods have not removed the need to manually tune the optimization parameters. Even in smooth convex optimization, which is often believed to be much simpler than the nonconvex counterpart, robust rules for stepsize selection have been elusive. The purpose of this work is to remedy this deficiency. The problem formulation that we consider is the basic unconstrained optimization problem min x R d f (x), (1) where f: R d R is a differentiable function. Throughout the paper we assume that (1) has a solution and we denote its optimal value by f . The simplest and most known approach to this problem is the gradient descent method (GD), whose origin can be traced back to Cauchy [7,20]. Although it is probably the oldest optimization method, it continues to play a central role in modern algorithmic theory and applications. Its definition can be written in a mere one line, x k 1 x k ฮป f (x k), k 0, (2) where x 0 R d is arbitrary and ฮป 0 .


Graph Construction from Data using Non Negative Kernel regression (NNK Graphs)

arXiv.org Machine Learning

Data driven graph constructions are often used in various applications, including several machine learning tasks, where the goal is to make predictions and discover patterns. However, learning an optimal graph from data is still a challenging task. Weighted $K$-nearest neighbor and $\epsilon$-neighborhood methods are among the most common graph construction methods, due to their computational simplicity but the choice of parameters such as $K$ and $\epsilon$ associated with these methods is often ad hoc and lacks a clear interpretation. We formulate graph construction as the problem of finding a sparse signal approximation in kernel space, identifying key similarities between methods in signal approximation and existing graph learning methods. We propose non-negative kernel regression~(NNK), an improved approach for graph construction with interesting geometric and theoretical properties. We show experimentally the efficiency of NNK graphs, its robustness to choice of sparsity $K$ and better performance over state of the art graph methods in semi supervised learning tasks on real world data.


A Stochastic Extra-Step Quasi-Newton Method for Nonsmooth Nonconvex Optimization

arXiv.org Machine Learning

In this paper, a novel stochastic extra-step quasi-Newton method is developed to solve a class of nonsmooth nonconvex composite optimization problems. We assume that the gradient of the smooth part of the objective function can only be approximated by stochastic oracles. The proposed method combines general stochastic higher order steps derived from an underlying proximal type fixed-point equation with additional stochastic proximal gradient steps to guarantee convergence. Based on suitable bounds on the step sizes, we establish global convergence to stationary points in expectation and an extension of the approach using variance reduction techniques is discussed. Motivated by large-scale and big data applications, we investigate a stochastic coordinate-type quasi-Newton scheme that allows to generate cheap and tractable stochastic higher order directions. Finally, the proposed algorithm is tested on large-scale logistic regression and deep learning problems and it is shown that it compares favorably with other state-of-the-art methods.


Bayesian Optimization Allowing for Common Random Numbers

arXiv.org Machine Learning

Bayesian optimization is a powerful tool for expensive stochastic black-box optimization problems such as simulation-based optimization or machine learning hyperparameter tuning. Many stochastic objective functions implicitly require a random number seed as input. By explicitly reusing a seed a user can exploit common random numbers, comparing two or more inputs under the same randomly generated scenario, such as a common customer stream in a job shop problem, or the same random partition of training data into training and validation set for a machine learning algorithm. With the aim of finding an input with the best average performance over infinitely many seeds, we propose a novel Gaussian process model that jointly models both the output for each seed and the average. We then introduce the Knowledge gradient for Common Random Numbers that iteratively determines a combination of input and random seed to evaluate the objective and automatically trades off reusing old seeds and querying new seeds, thus overcoming the need to evaluate inputs in batches or measuring differences of pairs as suggested in previous methods. We investigate the Knowledge Gradient for Common Random Numbers both theoretically and empirically, finding it achieves significant performance improvements with only moderate added computational cost.


ALGAMES: A Fast Solver for Constrained Dynamic Games

arXiv.org Artificial Intelligence

Dynamic games are an effective paradigm for dealing with the control of multiple interacting actors. Current algorithms for solving these problems either rely on Hamilton-Jacobi-Isaacs (HJI) methods, dynamic programming (DP), differential dynamic programming (DDP), or an iterative best response approach (IBR). The first two approaches have strong theoretical guarantees; however they becomes intractable in high-dimensional real-world applications. The third approach is grounded in the success of iLQR. It is scalable, but it cannot handle constraints. Finally, the iterative best response algorithm is a heuristic approach with unknown convergence properties, and it can suffer from stability and tractability issues. This paper introduces ALGAMES (Augmented Lagrangian GAME-theoretic Solver), a solver that handles trajectory optimization problems with multiple actors and general nonlinear state and input constraints. We evaluate our solver in the context of autonomous driving on scenarios involving numerous vehicles such as ramp merging, overtaking, and lane changing. We present simulation and timing results demonstrating the speed and the ability of the solver to produce efficient, safe, and natural autonomous behaviors.


Regularization Matters in Policy Optimization

arXiv.org Artificial Intelligence

Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., $L_2$ regularization, dropout) have been largely ignored in RL methods, possibly because agents are typically trained and evaluated in the same environment. In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. Interestingly, we find conventional regularization techniques on the policy networks can often bring large improvement on the task performance, and the improvement is typically more significant when the task is more difficult. We also compare with the widely used entropy regularization and find $L_2$ regularization is generally better. Our findings are further confirmed to be robust against the choice of training hyperparameters. We also study the effects of regularizing different components and find that only regularizing the policy network is typically enough. We hope our study provides guidance for future practices in regularizing policy optimization algorithms.


Fast Exact Matrix Completion: A Unifying Optimization Framework

arXiv.org Machine Learning

We consider the problem of matrix completion of rank $k$ on an $n\times m$ matrix. We show that both the general case and the case with side information can be formulated as a combinatorical problem of selecting $k$ vectors from $p$ column features. We demonstrate that it is equivalent to a separable optimization problem that is amenable to stochastic gradient descent. We design fastImpute, based on projected stochastic gradient descent, to enable efficient scaling of the algorithm of sizes of $10^5 \times 10^5$. We report experiments on both synthetic and real-world datasets that show fastImpute is competitive in both the accuracy of the matrix recovered and the time needed across all cases. Furthermore, when a high number of entries are missing, fastImpute is over $75\%$ lower in MAPE and $10$x faster than current state-of-the-art matrix completion methods in both the case with side information and without.