Optimization
Average-reward model-free reinforcement learning: a systematic review and literature mapping
Dewanto, Vektor, Dunn, George, Eshragh, Ali, Gallagher, Marcus, Roosta, Fred
Model-free reinforcement learning (RL) has been an active area of research and provides a fundamental framework for agent-based learning and decision-making in artificial intelligence. In this paper, we review a specific subset of this literature, namely work that utilizes optimization criteria based on average rewards, in the infinite horizon setting. Average reward RL has the advantage of being the most selective criterion in recurrent (ergodic) Markov decision processes. In comparison to widely-used discounted reward criterion, it also requires no discount factor, which is a critical hyperparameter, and properly aligns the optimization and performance metrics. Motivated by the solo survey by Mahadevan (1996a), we provide an updated review of work in this area and extend it to cover policy-iteration and function approximation methods (in addition to the value-iteration and tabular counterparts). We also identify and discuss opportunities for future work.
Tight Lower Complexity Bounds for Strongly Convex Finite-Sum Optimization
Finite-sum optimization plays an important role in the area of machine learning, and hence has triggered a surge of interest in recent years. To address this optimization problem, various randomized incremental gradient methods have been proposed with guaranteed upper and lower complexity bounds for their convergence. Nonetheless, these lower bounds rely on certain conditions: deterministic optimization algorithm, or fixed probability distribution for the selection of component functions. Meanwhile, some lower bounds even do not match the upper bounds of the best known methods in certain cases. To break these limitations, we derive tight lower complexity bounds of randomized incremental gradient methods, including SAG, SAGA, SVRG, and SARAH, for two typical cases of finite-sum optimization. Specifically, our results tightly match the upper complexity of Katyusha when each component function is strongly convex and smooth, and tightly match the upper complexity of SDCA without duality and of KatyushaX when the finite-sum function is strongly convex and the component functions are average smooth.
Revisiting Frank-Wolfe for Polytopes: Strict Complementarity and Sparsity
In recent years it was proved that simple modifications of the classical Frank-Wolfe algorithm (aka conditional gradient algorithm) for smooth convex minimization over convex and compact polytopes, converge with linear rate, assuming the objective function has the quadratic growth property. However, the rate of these methods depends explicitly on the dimension of the problem which cannot explain their empirical success for large scale problems. In this paper we first demonstrate that already for very simple problems and even when the optimal solution lies on a low-dimensional face of the polytope, such dependence on the dimension cannot be avoided in worst case. We then revisit the addition of a strict complementarity assumption already considered in Wolfe's classical book \cite{Wolfe1970}, and prove that under this condition, the Frank-Wolfe method with away-steps and line-search converges linearly with rate that depends explicitly only on the dimension of the optimal face. We motivate strict complementarity by proving that it implies sparsity-robustness of optimal solutions to noise.
Artificial Intelligence: Optimization Algorithms in Python
What would an "optimal world" look like to you? Would people get along better? Would we take better care of our environment? Many data scientists choose to optimize by using pre-built machine learning libraries. But we think that this kind of'plug-and-play' study hinders your learning. That's why this course gets you to build an optimization algorithm from the ground up.
Idle Vehicle Relocation Strategy through Deep Learning for Shared Autonomous Electric Vehicle System Optimization
Kim, Seongsin, Lee, Ungki, Lee, Ikjin, Kang, Namwoo
Corresponding authors Abstract In optimization of a shared autonomous electric vehicle (SAEV) system, idle vehicle relocation strategies are important to reduce operation costs and customers' wait time. However, for an on-demand service, continuous optimization for idle vehicle relocation is computationally expensive, and thus, not effective. This study proposes a deep learning-based algorithm that can instantly predict the optimal solution to idle vehicle relocation problems under various traffic conditions. The proposed relocation process comprises three steps. First, a deep learningbased passenger demand prediction model using taxi big data is built. Second, idle vehicle relocation problems are solved based on predicted demands, and optimal solution data are collected. Finally, a deep learning model using the optimal solution data is built to estimate the optimal strategy without solving relocation. In addition, the proposed idle vehicle relocation model is validated by applying it to optimize the SAEV system. We present an optimal service system including the design of SAEV vehicles and charging stations. Further, we demonstrate that the proposed strategy can drastically reduce operation costs and wait times for on-demand services. Keywords: Idle vehicle relocation, deep learning, shared autonomous electric vehicle (SAEV), demand prediction, system optimization 1. Introduction Shared autonomous electric vehicles (SAEVs) that combine car sharing services, autonomous driving technology, and electric vehicles (EVs) are expected to revolutionize transportation systems in the near future [1,2]. An SAEV autonomously goes to the location requested by a customer and rides that customer to a prescribed destination, thus providing a low-stress and safe transportation service [3,4], promoting transportation accessibility [5], and reducing mobility costs [6]. In addition, EVs help reduce fuel consumption and produce less environmental pollutants and greenhouse gas emissions [7-10].
Learning Robust Algorithms for Online Allocation Problems Using Adversarial Training
Zuzic, Goran, Wang, Di, Mehta, Aranyak, Sivakumar, D.
We address the challenge of finding algorithms for online allocation (i.e. bipartite matching) using a machine learning approach. In this paper, we focus on the AdWords problem, which is a classical online budgeted matching problem of both theoretical and practical significance. In contrast to existing work, our goal is to accomplish algorithm design {\em tabula rasa}, i.e., without any human-provided insights or expert-tuned training data beyond specifying the objective and constraints of the optimization problem. We construct a framework based on insights and ideas from game theory, adversarial training and GANs Key to our approach is to generate adversarial examples that expose the weakness of any given algorithm. A unique challenge in our context is to generate complete examples from scratch rather than perturbing given examples and we demonstrate this can be accomplished for the Adwords problem. We use this framework to co-train an algorithm network and an adversarial network against each other until they converge to an equilibrium. This approach finds algorithms and adversarial examples that are consistent with known optimal results. Secondly, we address the question of robustness of the algorithm, namely can we design algorithms that are both strong under practical distributions, as well as exhibit robust performance against adversarial instances. To accomplish this, we train algorithm networks using a mixture of adversarial and practical distributions like power-laws; the resulting networks exhibit a smooth trade-off between the two input regimes.
Projecting to Manifolds via Unsupervised Learning
Heaton, Howard, Fung, Samy Wu, Lin, Alex Tong, Osher, Stanley, Yin, Wotao
We present a new mechanism, called adversarial projection, that projects a given signal onto the intrinsically low dimensional manifold of true data. This operator can be used for solving inverse problems, which consists of recovering a signal from a collection of noisy measurements. Rather than attempt to encode prior knowledge via an analytic regularizer, we leverage available data to project signals directly onto the (possibly nonlinear) manifold of true data (i.e., regularize via an indicator function of the manifold). Our approach avoids the difficult task of forming a direct representation of the manifold. Instead, we directly learn the projection operator by solving a sequence of unsupervised learning problems, and we prove our method converges in probability to the desired projection. This operator can then be directly incorporated into optimization algorithms in the same manner as Plug-and-Play methods, but now with robust theoretical guarantees. Numerical examples are provided.
Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning
Horwood, Julien, Noutahi, Emmanuel
The fundamental goal of generative drug design is to propose optimized molecules that meet predefined activity, selectivity, and pharmacokinetic criteria. Despite recent progress, we argue that existing generative methods are limited in their ability to favourably shift the distributions of molecular properties during optimization. We instead propose a novel Reinforcement Learning framework for molecular design in which an agent learns to directly optimize through a space of synthetically-accessible drug-like molecules. This becomes possible by defining transitions in our Markov Decision Process as chemical reactions, and allows us to leverage synthetic routes as an inductive bias. We validate our method by demonstrating that it outperforms existing state-of the art approaches in the optimization of pharmacologically-relevant objectives, while results on multi-objective optimization tasks suggest increased scalability to realistic pharmaceutical design problems.
Asynchronous \epsilon-Greedy Bayesian Optimisation
De Ath, George, Everson, Richard M., Fieldsend, Jonathan E.
Bayesian Optimisation (BO) is a popular surrogate model-based approach for optimising expensive black-box functions. In order to reduce optimisation wallclock time, parallel evaluation of the black-box function has been proposed. Asynchronous BO allows for a new evaluation to be started as soon as another finishes, thus maximising utilisation of evaluation workers. We present AEGiS (Asynchronous $\epsilon$-Greedy Global Search), an asynchronous BO method that, with probability $2\epsilon$, performs either Thompson sampling or random selection from the approximate Pareto set trading-off between exploitation (surrogate mean prediction) and exploration (surrogate posterior variance). The remaining $1-2\epsilon$ of moves exploit the surrogate's mean prediction. Results on fifteen synthetic benchmark problems, three meta-surrogate hyperparameter tuning problems and two robot pushing problems show that AEGiS generally outperforms existing methods for asynchronous BO. When a single worker is available performance is no worse than BO using expected improvement. We also verify the importance of each of the three components in an ablation study, as well as comparing Pareto set selection to selection from the entire feasible problem domain, finding that the former is vastly superior.
Joint Inference of Multiple Graphs from Matrix Polynomials
Navarro, Madeline, Wang, Yuhao, Marques, Antonio G., Uhler, Caroline, Segarra, Santiago
Inferring graph structure from observations on the nodes is an important and popular network science task. Departing from the more common inference of a single graph and motivated by social and biological networks, we study the problem of jointly inferring multiple graphs from the observation of signals at their nodes (graph signals), which are assumed to be stationary in the sought graphs. From a mathematical point of view, graph stationarity implies that the mapping between the covariance of the signals and the sparse matrix representing the underlying graph is given by a matrix polynomial. A prominent example is that of Markov random fields, where the inverse of the covariance yields the sparse matrix of interest. From a modeling perspective, stationary graph signals can be used to model linear network processes evolving on a set of (not necessarily known) networks. Leveraging that matrix polynomials commute, a convex optimization method along with sufficient conditions that guarantee the recovery of the true graphs are provided when perfect covariance information is available. Particularly important from an empirical viewpoint, we provide high-probability bounds on the recovery error as a function of the number of signals observed and other key problem parameters. Numerical experiments using synthetic and real-world data demonstrate the effectiveness of the proposed method with perfect covariance information as well as its robustness in the noisy regime.