Goto

Collaborating Authors

 Optimization


Normalization Techniques in Training DNNs: Methodology, Analysis and Application

arXiv.org Machine Learning

Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the perspective of optimization, and present a taxonomy for understanding the similarities and differences between them. Specifically, we decompose the pipeline of the most representative normalizing activation methods into three components: the normalization area partitioning, normalization operation and normalization representation recovery. In doing so, we provide insight for designing new normalization technique. Finally, we discuss the current progress in understanding normalization methods, and provide a comprehensive review of the applications of normalization for particular tasks, in which it can effectively solve the key issues.


Scientists use reinforcement learning to train quantum algorithm

#artificialintelligence

Recent advancements in quantum computing have driven the scientific community's quest to solve a certain class of complex problems for which quantum computers would be better suited than traditional supercomputers. To improve the efficiency with which quantum computers can solve these problems, scientists are investigating the use of artificial intelligence approaches. In a new study, scientists at the U.S. Department of Energy's (DOE) Argonne National Laboratory have developed a new algorithm based on reinforcement learning to find the optimal parameters for the Quantum Approximate Optimization Algorithm (QAOA), which allows a quantum computer to solve certain combinatorial problems such as those that arise in materials design, chemistry and wireless communications. "Combinatorial optimization problems are those for which the solution space gets exponentially larger as you expand the number of decision variables," said Argonne computer scientist Prasanna Balaprakash. "In one traditional example, you can find the shortest route for a salesman who needs to visit a few cities once by enumerating all possible routes, but given a couple thousand cities, the number of possible routes far exceeds the number of stars in the universe; even the fastest supercomputers cannot find the shortest route in a reasonable time."


Operations Research Learning

#artificialintelligence

There is a huge synergy between Operations Research (OR) and Machine Learning (ML). While some ML researchers are using OR to improve further their learning, some OR researchers are using ML to incorporate learning in the optimization process with the expectation of significant gain in terms of time, gap, as well as other metrics. In this article, I will go through some stories into which machine learning is leveraged to tackle optimization problems. I like calling it operations research learning (ORL). These stories provide insights about the way synergy is built, transferred among problems as well as prospective improvements opportunities.


Optimal Sepsis Patient Treatment using Human-in-the-loop Artificial Intelligence

#artificialintelligence

This study proposes a clinical prescriptive model with human in the loop functionality that recommends optimal, individual-specific amounts of IV fluids for the treatment of septic patients in ICUs. The proposed methodology combines constrained optimization and machine learning techniques to arrive at optimal solutions. A key novelty of the proposed clinical model is utilization of a physician's input to derive optimal solutions. The efficacy of the method is demonstrated using a real world medical dataset. We further validated the robustness of the proposed approach to show that our method benefits from the human in the loop component, but is also robust to poor input, which is a crucial consideration for new physicians.


Flight-connection Prediction for Airline Crew Scheduling to Construct Initial Clusters for OR Optimizer

arXiv.org Machine Learning

Airlines need to construct crew pairings to cover their flights. A pairing is a sequence of flights starting and finishing at a base and satisfying complex collective agreement constraints. For major airlines which handle more than 10k flights on a weekly basis, this becomes an important and difficult problem to solve. Efficient solutions are required since savings as low as 1% represent many dozens of millions saved every year. The complexity of the problem lies in the large number of possible pairings, and the selection of the set of pairings of minimal cost, which is a large integer programming problem impossible to solve with standard solvers (Elhallaoui et al., 2005; Kasirzadeh et al., 2017). In our review of related work, we address some advanced optimization techniques that reduce the number of variables and the number of constraints to solve it. The main drawback of these techniques, however, is that they require days to compute, while airlines are often given all the scheduling data only a few days before having to build the schedule. The objective of this paper is to use machine learning (ML) techniques to improve the algorithmic efficiency and solve this problem in a more feasible time horizon. Unfortunately, solving the problem with ML alone seems out of reach.


Pareto-Optimal Bit Allocation for Collaborative Intelligence

arXiv.org Artificial Intelligence

In recent studies, collaborative intelligence (CI) has emerged as a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile/edge devices. In CI, the AI model (a deep neural network) is split between the edge and the cloud, and intermediate features are sent from the edge sub-model to the cloud sub-model. In this paper, we study bit allocation for feature coding in multi-stream CI systems. We model task distortion as a function of rate using convex surfaces similar to those found in distortion-rate theory. Using such models, we are able to provide closed-form bit allocation solutions for single-task systems and scalarized multi-task systems. Moreover, we provide analytical characterization of the full Pareto set for 2-stream k-task systems, and bounds on the Pareto set for 3-stream 2-task systems. Analytical results are examined on a variety of DNN models from the literature to demonstrate wide applicability of the results


Exploring different optimization algorithms

#artificialintelligence

Machine learning is a field of study in the broad spectrum of artificial intelligence (AI) that can make predictions using data without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as recommendation engines, computer vision, spam filtering and so much more. They perform extraordinary well where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data-- over and over, faster and faster -- is a recent development. One of the most overwhelmingly represented machine learning techniques is a neural network.


Why Can a Machine Beat Mario but not Pokemon?

#artificialintelligence

By now, you've probably heard of bots playing video games at superhuman levels. These bots can be programmed explicitly, reacting to set inputs with set outputs, or learn and evolve, reacting in different ways to the same inputs in hopes of finding the optimal responses. These games are complex, and training these machines takes clever combinations of complicated algorithms, repeated simulations, and time. I want to focus on MarI/O and why we can't use a similar approach to beat a game of Pokemon (watch the video in the link above if you are unfamiliar with how it works). Let's compare the games using each of these factors. The way a machine learns is by optimizing some kind of objective function.


Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

arXiv.org Machine Learning

Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization -- an algorithmic scheme that encourages exploration -- and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain limited even for the tabular setting. This paper develops $\textit{non-asymptotic}$ convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly -- or even quadratically once it enters a local region around the optimal policy -- when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-\`a-vis inexactness of policy evaluation. Our convergence results accommodate a wide range of learning rates, and shed light upon the role of entropy regularization in enabling fast convergence.


Out-of-Distribution Generalization with Maximal Invariant Predictor

arXiv.org Machine Learning

Out-of-Distribution (OOD) generalization problem is a problem of seeking the predictor function whose performance in the worst environments is optimal. This paper makes two contributions to OOD problem. We first use the basic results of probability to prove Maximal Invariant Predictor(MIP) condition, a theoretical result that can be used to identify the OOD optimal solution. We then use our MIP to derive Inter-environmental Gradient Alignment (IGA) algorithm that can be used to help seek the OOD optimal predictor. Previous studies that have investigated the theoretical aspect of the OOD problem use strong structural assumptions such as causal DAG. However, in cases involving image datasets, for example, the identification of hidden structural relations is itself a difficult problem. Our theoretical results are different from those of many previous studies in that it can be applied to cases in which the underlying structure of dataset is difficult to analyze. We present an extensive comparison of previous theoretical approaches to the OOD problems based on the assumptions they make. We also present an extension of the Colored-MNIST that can more accurately represent the pathological OOD situation than the original version, and demonstrate the superiority of IGA over previous methods on both the original and the extended version of Colored-MNIST.