Optimization
Resonant Machine Learning Based on Complex Growth Transform Dynamical Systems
Chatterjee, Oindrila, Chakrabartty, Shantanu
In this paper we propose an energy-efficient learning framework which exploits structural and functional similarities between a machine learning network and a general electrical network satisfying the Tellegen's theorem. The proposed formulation ensures that the network's active-power is dissipated only during the process of learning, whereas the network's reactive-power is maintained to be zero at all times. As a result, in steady-state, the learned parameters are stored and self-sustained by electrical resonance determined by the network's nodal inductances and capacitances. Based on this approach, this paper introduces three novel concepts: (a) A learning framework where the network's active-power dissipation is used as a regularization for a learning objective function that is subjected to zero total reactive-power constraint; (b) A dynamical system based on complex-domain, continuous-time growth transforms which optimizes the learning objective function and drives the network towards electrical resonance under steady-state operation; and (c) An annealing procedure that controls the trade-off between active-power dissipation and the speed of convergence. As a representative example, we show how the proposed framework can be used for designing resonant support vector machines (SVMs), where we show that the support-vectors correspond to an LC network with self-sustained oscillations. We also show that this resonant network dissipates less active-power compared to its non-resonant counterpart.
Reasoning-Driven Question-Answering for Natural Language Understanding
Natural language understanding (NLU) of text is a fundamental challenge in AI, and it has received significant attention throughout the history of NLP research. This primary goal has been studied under different tasks, such as Question Answering (QA) and Textual Entailment (TE). In this thesis, we investigate the NLU problem through the QA task and focus on the aspects that make it a challenge for the current state-of-the-art technology. This thesis is organized into three main parts: In the first part, we explore multiple formalisms to improve existing machine comprehension systems. We propose a formulation for abductive reasoning in natural language and show its effectiveness, especially in domains with limited training data. Additionally, to help reasoning systems cope with irrelevant or redundant information, we create a supervised approach to learn and detect the essential terms in questions. In the second part, we propose two new challenge datasets. In particular, we create two datasets of natural language questions where (i) the first one requires reasoning over multiple sentences; (ii) the second one requires temporal common sense reasoning. We hope that the two proposed datasets will motivate the field to address more complex problems. In the final part, we present the first formal framework for multi-step reasoning algorithms, in the presence of a few important properties of language use, such as incompleteness, ambiguity, etc. We apply this framework to prove fundamental limitations for reasoning algorithms. These theoretical results provide extra intuition into the existing empirical evidence in the field.
Constrained Multi-Objective Optimization for Automated Machine Learning
Gardner, Steven, Golovidov, Oleg, Griffin, Joshua, Koch, Patrick, Thompson, Wayne, Wujek, Brett, Xu, Yan
--Automated machine learning has gained a lot of attention recently. Building and selecting the right machine learning models is often a multi-objective optimization problem. General purpose machine learning software that simultaneously supports multiple objectives and constraints is scant, though the potential benefits are great. In this work, we present a framework called Autotune that effectively handles multiple objectives and constraints that arise in machine learning problems. Autotune is built on a suite of derivative-free optimization methods, and utilizes multilevel parallelism in a distributed computing environment for automatically training, scoring, and selecting good models. Incorporation of multiple objectives and constraints in the model exploration and selection process provides the flexibility needed to satisfy tradeoffs necessary in practical machine learning applications. Experimental results from standard multi-objective optimization benchmark problems show that Autotune is very efficient in capturing Pareto fronts. These benchmark results also show how adding constraints can guide the search to more promising regions of the solution space, ultimately producing more desirable Pareto fronts. Results from two real-world case studies demonstrate the effectiveness of the constrained multi-objective optimization capability offered by Autotune. There has been increasing interest in automated machine learning (AutoML) for improving data scientists' productivity and reducing the cost of model building. A number of general or specialized AutoML systems have been developed [1]- [7], showing impressive results in creating good models with much less manual effort. Most of these systems only support a single objective, typically accuracy or error, to assess and compare models during the automation process.
Distributionally Robust Optimization: A Review
Rahimian, Hamed, Mehrotra, Sanjay
The concepts of risk-aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization.
Deep Kernel Learning for Clustering
Wu, Chieh, Khan, Zulqarnain, Chang, Yale, Ioannidis, Stratis, Dy, Jennifer
We propose a deep learning approach for discovering kernels tailored to identifying clusters over sample data. Our neural network produces sample embeddings that are motivated by--and are at least as expressive as--spectral clustering. Our training objective, based on the Hilbert Schmidt Information Criterion, can be optimized via gradient adaptations on the Stiefel manifold, leading to significant acceleration over spectral methods relying on eigendecompositions. Finally, our trained embedding can be directly applied to out-of-sample data. We show experimentally that our approach outperforms several state-of-the-art deep clustering methods, as well as traditional approaches such as $k$-means and spectral clustering over a broad array of real-life and synthetic datasets.
Autonomous Target Search with Multiple Coordinated UAVs
Piacentini, Chiara, Bernardini, Sara, Beck, J. Christopher
Search and tracking is the problem of locating a moving target and following it to its destination. In this work, we consider a scenario in which the target moves across a large geographical area by following a road network and the search is performed by a team of unmanned aerial vehicles (UAVs). We formulate search and tracking as a combinatorial optimization problem and prove that the objective function is submodular. We exploit this property to devise a greedy algorithm. Although this algorithm does not offer strong theoretical guarantees because of the presence of temporal constraints that limit the feasibility of the solutions, it presents remarkably good performance, especially when several UAVs are available for the mission. As the greedy algorithm suffers when resources are scarce, we investigate two alternative optimization techniques: Constraint Programming (CP) and AI planning. Both approaches struggle to cope with large problems, and so we strengthen them by leveraging the greedy algorithm. We use the greedy solution to warm start the CP model and to devise a domain-dependent heuristic for planning. Our extensive experimental evaluation studies the scalability of the different techniques and identifies the conditions under which one approach becomes preferable to the others.
On the Variance of the Adaptive Learning Rate and Beyond
Liu, Liyuan, Jiang, Haoming, He, Pengcheng, Chen, Weizhu, Liu, Xiaodong, Gao, Jianfeng, Han, Jiawei
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam. Here, we study its mechanism in details. Pursuing the theory behind warmup, we identify a problem of the adaptive learning rate (i.e., it has problematically large variance in the early stage), suggest warmup works as a variance reduction technique, and provide both empirical and theoretical evidence to verify our hypothesis. We further propose RAdam, a new variant of Adam, by introducing a term to rectify the variance of the adaptive learning rate. Extensive experimental results on image classification, language modeling, and neural machine translation verify our intuition and demonstrate the effectiveness and robustness of our proposed method. All implementations are available at: https://github.com/LiyuanLucasLiu/RAdam.
Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods
Cheng, Ching-An, Yan, Xinyan, Boots, Byron
Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods. Previous research has endeavored to contend with this problem by studying control variates (CVs) that can reduce the variance of estimates without introducing bias, including the early use of baselines, state dependent CVs, and the more recent state-action dependent CVs. In this work, we analyze the properties and drawbacks of previous CV techniques and, surprisingly, we find that these works have overlooked an important fact that Monte Carlo gradient estimates are generated by trajectories of states and actions. We show that ignoring the correlation across the trajectories can result in suboptimal variance reduction, and we propose a simple fix: a class of "trajectory-wise" CVs, that can further drive down the variance. We show that constructing trajectory-wise CVs can be done recursively and requires only learning state-action value functions like the previous CVs for policy gradient. We further prove that the proposed trajectory-wise CVs are optimal for variance reduction under reasonable assumptions.
Gradient Flows and Accelerated Proximal Splitting Methods
Franรงa, Guilherme, Robinson, Daniel P., Vidal, Renรฉ
Proximal based algorithms are well-suited to nonsmooth optimization problems with important applications in signal processing, control theory, statistics and machine learning. There are essentially four basic types of proximal algorithms based on fixed-point iteration currently known: forward-backward splitting, forward-backward-forward or Tseng splitting, Douglas-Rachford, and the very recent Davis-Yin three-operator splitting. In addition, the alternating direction method of multipliers (ADMM) is also closely related. In this paper, we show that all these different methods can be derived from the gradient flow by using splitting methods for ordinary differential equations. Furthermore, applying similar discretization scheme to a particular second order differential equation results in accelerated variants of the respective algorithm, which can be of Nesterov or heavy ball type, although we treat both simultaneously. Many of the optimization algorithms we derive are new. For instance, we propose accelerated variants of Davis-Yin and two extensions of ADMM together with their accelerated variants. Interestingly, we show that (accelerated) ADMM corresponds to a rebalanced splitting which is a recent technique designed to preserve steady states of the differential equation. Overall, our results strengthen the connections between optimization and continuous dynamical systems and offer a more unified perspective on accelerated methods.
TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems
Guo, Wenbo, Wang, Lun, Xing, Xinyu, Du, Min, Song, Dawn
A trojan backdoor is a hidden pattern typically implanted in a deep neural network. It could be activated and thus forces that infected model behaving abnormally only when an input data sample with a particular trigger present is fed to that model. As such, given a deep neural network model and clean input samples, it is very challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this acute problem. They demonstrate the proposed techniques have a great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption (e.g. assuming availability of the contaminated training database). On the other hand, the proposed techniques cannot accurately detect the existence of trojan backdoors, nor restore high-fidelity trojan backdoor images, especially when the triggers pertaining to the trojan vary in size, shape and position. In this work, we propose TABOR, a new trojan detection technique. Conceptually, it formalizes a trojan detection task as a non-convex optimization problem, and the detection of a trojan backdoor as the task of resolving the optimization through an objective function. Different from the existing technique also modeling trojan detection as an optimization problem, TABOR designs a new objective function--under the guidance of explainable AI techniques as well as heuristics--that could guide optimization to identify a trojan backdoor in a more effective fashion. In addition, TABOR defines a new metric to measure the quality of a trojan backdoor identified. Using an anomaly detection method, we show the new metric could better facilitate TABOR to identify intentionally injected triggers in an infected model and filter out false alarms......