Goto

Collaborating Authors

 Optimization


VitaOptimum Library - The Best Global Optimization Solver

#artificialintelligence

If we go back in history, we will see that Linear Programming (LP) was the first to appear. It will be fair to say that LP fostered the economic development of a great number of countries in the 20th century. Nevertheless, as plenty of real-world problems are nonlinear, there was a need for new kind of solvers. Via Quadratic Programming (QP), the methods of Nonlinear Programming (NLP) were developed. But even they sometimes do not provide us with the best solution.


A Multi-Scale Tensor Network Architecture for Classification and Regression

arXiv.org Machine Learning

A Multi-Scale T ensor Network Architecture for Classification and Regression Justin Reyes 1 and E. Miles Stoudenmire 2 1 Department of Physics, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, USA 2 Center for Computational Quantum Physics, Flatiron Institute, 162 5th Avenue, New Y ork, NY 10010, USA (Dated: January 24, 2020) We present an algorithm for supervised learning using tensor networks, employing a step of preprocessing the data by coarse-graining through a sequence of wavelet transformations. We represent these transformations as a set of tensor network layers identical to those in a multi-scale entanglement renormalization ansatz (MERA) tensor network, and perform supervised learning and regression tasks through a model based on a matrix product state (MPS) tensor network acting on the coarse-grained data. Because the entire model consists of tensor contractions (apart from the initial nonlinear feature map), we can adaptively fine-grain the optimized MPS model backwards through the layers with essentially no loss in performance. The MPS itself is trained using an adaptive algorithm based on the density matrix renormalization group (DMRG) algorithm. We test our methods by performing a classification task on audio data and a regression task on temperature time-series data, studying the dependence of training accuracy on the number of coarse-graining layers and showing how fine-graining through the network may be used to initialize models with access to finer-scale features. I. INTRODUCTION Computational techniques developed across the machine learning and physics fields have consistently generated promising methods and applications in both areas of study. The application of well established machine learning architectures and optimization techniques has enriched the physics community with advances such as modeling and recognizing topological quantum states [1-3], optimizing quantum error correction codes [4], or classifying quantum walks [5]. Conversely, techniques known as tensor networks which model high-dimensional functions and are closely connected to physical principles have begun to be explored more in applied mathematics and machine learning [6-16].


Local Policy Optimization for Trajectory-Centric Reinforcement Learning

arXiv.org Machine Learning

The goal of this paper is to present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning (MBRL). This is motivated by the fact that global policy optimization for non-linear systems could be a very challenging problem both algorithmically and numerically. However, a lot of robotic manipulation tasks are trajectory-centric, and thus do not require a global model or policy. Due to inaccuracies in the learned model estimates, an open-loop trajectory optimization process mostly results in very poor performance when used on the real system. Motivated by these problems, we try to formulate the problem of trajectory optimization and local policy synthesis as a single optimization problem. It is then solved simultaneously as an instance of nonlinear programming. We provide some results for analysis as well as achieved performance of the proposed technique under some simplifying assumptions.


Optimal binning: mathematical programming formulation

arXiv.org Machine Learning

January 23, 2020 Abstract The optimal binning is the optimal discretization of a variable into bins given a discrete or continuous numeric target. We present a rigorous and extensible mathematical programming formulation to solving the optimal binning problem for a binary, continuous and multi-class target type, incorporating constraints not previously addressed. For all three target types, we introduce a convex mixed-integer programming formulation. Several algorithmic enhancements such as automatic determination of the most suitable monotonic trend via a Machine-Learning-based classifier and implementation aspects are thoughtfully discussed. The new mathematical programming formulations are carefully implemented in the open-source python library OptBinning. 1 Introduction Binning (grouping or bucketing) is a technique to discretize the values of a continuous variable into bins (groups or buckets). From a modeling perspective, the binning technique may address prevalent data issues such as the handling of missing values, the presence of outliers and statistical noise, and data scaling. Furthermore, the binning process is a valuable interpretable tool to enhance the understanding of the nonlinear dependence between a variable and a given target while reducing the model complexity. Ultimately, resulting bins can be used to perform data transformations. Binning techniques are extensively used in machine learning applications, exploratory data analysis and as an algorithm to speed up learning tasks; recently, binning has been applied to accelerate learning in gradient boosting decision tree [12].


Search-Based Software Engineering for Self-Adaptive Systems: One Survey, Five Disappointments and Six Opportunities

arXiv.org Artificial Intelligence

Search-Based Software Engineering (SBSE) is a promising paradigm that exploits computational search to optimize different processes when engineering complex software systems. Self-adaptive system (SAS) is one category of such complex systems that permits to optimize different functional and non-functional objectives/criteria under changing environment (e.g., requirements and workload), which involves problems that are subject to search. In this regard, over years, there have been a considerable amount of work that investigates SBSE for SASs. In this paper, we provide the first systematic and comprehensive survey exclusively on SBSE for SASs, covering 3,740 papers in 27 venues from 7 repositories, which eventually leads to several key statistics from the most notable 73 primary studies in this particular field of research. Our results, surprisingly, have revealed five disappointed issues that are of utmost importance, but have been overwhelmingly ignored in existing studies. We provide evidences to justify our arguments against the disappointments and highlight six emergent, but currently under-explored opportunities for future work on SBSE for SASs. By mitigating the disappointed issues revealed in this work, together with the highlighted opportunities, we hope to be able to excite a much more significant growth on this particular research direction.


Learning to Control PDEs with Differentiable Physics

arXiv.org Machine Learning

Predicting outcomes and planning interactions with the physical world are long-standing goals for machine learning. A variety of such tasks involves continuous physical systems, which can be described by partial differential equations (PDEs) with many degrees of freedom. Existing methods that aim to control the dynamics of such systems are typically limited to relatively short time frames or a small number of interaction parameters. We present a novel hierarchical predictor-corrector scheme which enables neural networks to learn to understand and control complex nonlinear physical systems over long time frames. We propose to split the problem into two distinct tasks: planning and control. To this end, we introduce a predictor network that plans optimal trajectories and a control network that infers the corresponding control parameters. Both stages are trained end-to-end using a differentiable PDE solver. We demonstrate that our method successfully develops an understanding of complex physical systems and learns to control them for tasks involving PDEs such as the incompressible Navier-Stokes equations.


Convergence Time Optimization for Federated Learning over Wireless Networks

arXiv.org Machine Learning

In this paper, the convergence time of federated learning (FL), when deployed over a realistic wireless network, is studied. In particular, a wireless network is considered in which wireless users transmit their local FL models (trained using their locally collected data) to a base station (BS). The BS, acting as a central controller, generates a global FL model using the received local FL models and broadcasts it back to all users. Due to the limited number of resource blocks (RBs) in a wireless network, only a subset of users can be selected to transmit their local FL model parameters to the BS at each learning step. Moreover, since each user has unique training data samples, the BS prefers to include all local user FL models to generate a converged global FL model. Hence, the FL performance and convergence time will be significantly affected by the user selection scheme. Therefore, it is necessary to design an appropriate user selection scheme that enables users of higher importance to be selected more frequently. This joint learning, wireless resource allocation, and user selection problem is formulated as an optimization problem whose goal is to minimize the FL convergence time while optimizing the FL performance. To solve this problem, a probabilistic user selection scheme is proposed such that the BS is connected to the users whose local FL models have significant effects on its global FL model with high probabilities. Given the user selection policy, the uplink RB allocation can be determined. To further reduce the FL convergence time, artificial neural networks (ANNs) are used to estimate the local FL models of the users that are not allocated any RBs for local FL model transmission at each given learning step, which enables the BS to enhance its global FL model and improve the FL convergence speed and performance.


Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

arXiv.org Machine Learning

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first design and analyze the Zeroth-Order Gradient Descent Ascent (\texttt{ZO-GDA}) algorithm, and provide improved results compared to existing works, in terms of oracle complexity. Next, we propose the Zeroth-Order Gradient Descent Multi-Step Ascent (\texttt{ZO-GDMSA}) algorithm that significantly improves the oracle complexity of \texttt{ZO-GDA}. We also provide stochastic version of \texttt{ZO-GDA} and \texttt{ZO-GDMSA} to handle stochastic nonconvex minimax problems, and provide oracle complexity results.


Lasso for hierarchical polynomial models

arXiv.org Machine Learning

In a polynomial regression model, the divisibility conditions implicit in polynomial hierarchy give way to a natural construction of constraints for the model parameters. We use this principle to derive versions of strong and weak hierarchy and to extend existing work in the literature, which at the moment is only concerned with models of degree two. We discuss how to estimate parameters in lasso using standard quadratic programming techniques and apply our proposal to both simulated data and examples from the literature. The proposed methodology compares favorably with existing techniques in terms of low validation error and model size.


Intelligence, physics and information -- the tradeoff between accuracy and simplicity in machine learning

arXiv.org Machine Learning

How can we enable machines to make sense of the world, and become better at learning? To approach this goal, I believe viewing intelligence in terms of many integral aspects, and also a universal two-term tradeoff between task performance and complexity, provides two feasible perspectives. In this thesis, I address several key questions in some aspects of intelligence, and study the phase transitions in the two-term tradeoff, using strategies and tools from physics and information. Firstly, how can we make the learning models more flexible and efficient, so that agents can learn quickly with fewer examples? Inspired by how physicists model the world, we introduce a paradigm and an AI Physicist agent for simultaneously learning many small specialized models (theories) and the domain they are accurate, which can then be simplified, unified and stored, facilitating few-shot learning in a continual way. Secondly, for representation learning, when can we learn a good representation, and how does learning depend on the structure of the dataset? We approach this question by studying phase transitions when tuning the tradeoff hyperparameter. In the information bottleneck, we theoretically show that these phase transitions are predictable and reveal structure in the relationships between the data, the model, the learned representation and the loss landscape. Thirdly, how can agents discover causality from observations? We address part of this question by introducing an algorithm that combines prediction and minimizing information from the input, for exploratory causal discovery from observational time series. Fourthly, to make models more robust to label noise, we introduce Rank Pruning, a robust algorithm for classification with noisy labels. I believe that building on the work of my thesis we will be one step closer to enable more intelligent machines that can make sense of the world.