Optimization
Stochastic Implicit Natural Gradient for Black-box Optimization
Black-box optimization is primarily important for many compute-intensive applications, including reinforcement learning (RL), robot control, etc. This paper presents a novel theoretical framework for black-box optimization, in which our method performs stochastic update within a trust region defined with KL-divergence. We show that this update is equivalent to a natural gradient step w.r.t. natural parameters of an exponential-family distribution. Theoretically, we prove the convergence rate of our framework for convex functions. Our theoretical results also hold for non-differentiable black-box functions. Empirically, our method achieves superior performance compared with the state-of-the-art method CMA-ES on separable benchmark test problems.
One Sample Stochastic Frank-Wolfe
Zhang, Mingrui, Shen, Zebang, Mokhtari, Aryan, Hassani, Hamed, Karbasi, Amin
One of the beauties of the projected gradient descent method lies in its rather simple mechanism and yet stable behavior with inexact, stochastic gradients, which has led to its wide-spread use in many machine learning applications. However, once we replace the projection operator with a simpler linear program, as is done in the Frank-Wolfe method, both simplicity and stability take a serious hit. The aim of this paper is to bring them back without sacrificing the efficiency. In this paper, we propose the first one-sample stochastic Frank-Wolfe algorithm, called 1-SFW, that avoids the need to carefully tune the batch size, step size, learning rate, and other complicated hyper parameters. In particular, 1-SFW achieves the optimal convergence rate of $\mathcal{O}(1/\epsilon^2)$ for reaching an $\epsilon$-suboptimal solution in the stochastic convex setting, and a $(1-1/e)-\epsilon$ approximate solution for a stochastic monotone DR-submodular maximization problem. Moreover, in a general non-convex setting, 1-SFW finds an $\epsilon$-first-order stationary point after at most $\mathcal{O}(1/\epsilon^3)$ iterations, achieving the current best known convergence rate. All of this is possible by designing a novel unbiased momentum estimator that governs the stability of the optimization process while using a single sample at each iteration.
Robust Dynamic Assortment Optimization in the Presence of Outlier Customers
Chen, Xi, Krishnamurthy, Akshay, Wang, Yining
We consider the dynamic assortment optimization problem under the multinomial logit model (MNL) with unknown utility parameters. The main question investigated in this paper is model mis-specification under the $\varepsilon$-contamination model, which is a fundamental model in robust statistics and machine learning. In particular, throughout a selling horizon of length $T$, we assume that customers make purchases according to a well specified underlying multinomial logit choice model in a ($1-\varepsilon$)-fraction of the time periods, and make arbitrary purchasing decisions instead in the remaining $\varepsilon$-fraction of the time periods. In this model, we develop a new robust online assortment optimization policy via an active elimination strategy. We establish both upper and lower bounds on the regret, and show that our policy is optimal up to logarithmic factor in T when the assortment capacity is constant. Furthermore, we develop a fully adaptive policy that does not require any prior knowledge of the contamination parameter $\varepsilon$. Our simulation study shows that our policy outperforms the existing policies based on upper confidence bounds (UCB) and Thompson sampling.
Optimal Training of Fair Predictive Models
Nabi, Razieh, Malinsky, Daniel, Shpitser, Ilya
Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance, without requiring parametric models for high-dimensional feature vectors.
Kernels over Sets of Finite Sets using RKHS Embeddings, with Application to Bayesian (Combinatorial) Optimization
Buathong, Poompol, Ginsbourger, David, Krityakierne, Tipaluck
We focus on kernel methods for set-valued inputs and their application to Bayesian set optimization, notably combinatorial optimization. We introduce a class of (strictly) positive definite kernels that relies on Reproducing Kernel Hilbert Space embeddings, and successfully generalizes "double sum" set kernels recently considered in Bayesian set optimization, which turn out to be unsuitable for combinatorial optimization. The proposed class of kernels, for which we provide theoretical guarantees, essentially consists in applying an outer kernel on top of the canonical distance induced by a double sum kernel. Proofs of theoretical results about considered kernels are complemented by a few practicalities regarding hyperparameter fitting. We furthermore demonstrate the applicability of our approach in prediction and optimization tasks, relying both on toy examples and on two test cases from mechanical engineering and hydrogeology, respectively. Experimental results illustrate the added value of the approach and open new perspectives in prediction and sequential design with set inputs.
Compatible features for Monotonic Policy Improvement
Tomczak, Marcin B., de Cote, Enrique Munoz, Macua, Sergio Valcarcel, Vrancx, Peter
Recent policy optimization approaches have achieved substantial empirical success by constructing surrogate optimization objectives. The Approximate Policy Iteration objective (Schulman et al., 2015a; Kakade and Langford, 2002) has become a standard optimization target for reinforcement learning problems. Using this objective in practice requires an estimator of the advantage function. Policy optimization methods such as those proposed in Schulman et al. (2015b) estimate the advantages using a parametric critic. In this work we establish conditions under which the parametric approximation of the critic does not introduce bias to the updates of surrogate objective. These results hold for a general class of parametric policies, including deep neural networks. We obtain a result analogous to the compatible features derived for the original Policy Gradient Theorem (Sutton et al., 1999). As a result, we also identify a previously unknown bias that current state-of-the-art policy optimization algorithms (Schulman et al., 2015a, 2017) have introduced by not employing these compatible features.
Policy Optimization Through Approximated Importance Sampling
Tomczak, Marcin B., Kim, Dongho, Vrancx, Peter, Kim, Kee-Eung
Recent policy optimization approaches (Schulman et al., 2015a, 2017) have achieved substantial empirical successes by constructing new proxy optimization objectives. These proxy objectives allow stable and low variance policy learning, but require small policy updates to ensure that the proxy objective remains an accurate approximation of the target policy value. In this paper we derive an alternative objective that obtains the value of the target policy by applying importance sampling. This objective can be directly estimated from samples, as it takes an expectation over trajectories generated by the current policy. However, the basic importance sampled objective is not suitable for policy optimization, as it incurs unacceptable variance. We therefore introduce an approximation that allows us to directly trade-off the bias of approximation with the variance in policy updates. We show that our approximation unifies the proxy optimization approaches with the importance sampling objective and allows us to interpolate between them. We then provide a theoretical analysis of the method that directly quantifies the error term due to the approximation. Finally, we obtain a practical algorithm by optimizing the introduced objective with proximal policy optimization techniques (Schulman etal., 2017). We empirically demonstrate that the result-ing algorithm yields superior performance on continuous control benchmarks
Automated Multidisciplinary Design and Control of Hopping Robots for Exploration of Extreme Environments on the Moon and Mars
Kalita, Himangshu, Thangavelautham, Jekan
The next frontier in solar system exploration will be missions targeting extreme and rugged environments such as caves, canyons, cliffs and crater rims of the Moon, Mars and icy moons. These environments are time capsules into early formation of the solar system and will provide vital clues of how our early solar system gave way to the current planets and moons. These sites will also provide vital clues to the past and present habitability of these environments. Current landers and rovers are unable to access these areas of high interest due to limitations in precision landing techniques, need for large and sophisticated science instruments and a mission assurance and operations culture where risks are minimized at all costs. Our past work has shown the advantages of using multiple spherical hopping robots called SphereX for exploring these extreme environments. Our previous work was based on performing exploration with a human-designed baseline design of a SphereX robot. However, the design of SphereX is a complex task that involves a large number of design variables and multiple engineering disciplines. In this work we propose to use Automated Multidisciplinary Design and Control Optimization (AMDCO) techniques to find near optimal design solutions in terms of mass, volume, power, and control for SphereX for different mission scenarios.
How to Implement Bayesian Optimization from Scratch in Python
Many methods exist for function optimization, such as randomly sampling the variable search space, called random search, or systematically evaluating samples in a grid across the search space, called grid search. More principled methods are able to learn from sampling the space so that future samples are directed toward the parts of the search space that are most likely to contain the extrema. A directed approach to global optimization that uses probability is called Bayesian Optimization. Take my free 7-day email crash course now (with sample code). Click to sign-up and also get a free PDF Ebook version of the course.
How It Feels to Learn Data Science in 2019
So I just have to buy a Tableau license and I'm now a data scientist? Okay, let's just take that sales pitch with a grain of salt. I may be clueless, but I know there is more to data science than making pretty visualizations. I can do that in Excel. You got to admit it is slick marketing though. Charting data is the fun stage, and they leave out the painful and time-consuming parts of working with data: cleaning, wrangling, transforming, and loading it. Yes, and that is why I suspect there is value in learning to code. Maybe you can learn Alteryx. There's another software called Alteryx that allows you to clean, wrangle, transform, and load data.