Goto

Collaborating Authors

 Optimization


Training Neural Networks with PSO - Vortarus Technologies LLC

#artificialintelligence

In this article we are going to discuss training neural networks using particle swarm optimization (PSO). Training a neural network is an optimization problem so the optimization algorithm is of primary importance. When training MLPs we are adjusting weights between neurons using an error function as our optimization objective. PNNs and GRNNs use a smoothing factor, ฯƒ to define the network. The objective is to find sigmas that minimize error.


Hessian barrier algorithms for linearly constrained optimization problems

arXiv.org Artificial Intelligence

In this paper, we propose an interior-point method for linearly constrained optimization problems (possibly nonconvex). The method - which we call the Hessian barrier algorithm (HBA) - combines a forward Euler discretization of Hessian Riemannian gradient flows with an Armijo backtracking step-size policy. In this way, HBA can be seen as an alternative to mirror descent (MD), and contains as special cases the affine scaling algorithm, regularized Newton processes, and several other iterative solution methods. Our main result is that, modulo a non-degeneracy condition, the algorithm converges to the problem's set of critical points; hence, in the convex case, the algorithm converges globally to the problem's minimum set. In the case of linearly constrained quadratic programs (not necessarily convex), we also show that the method's convergence rate is $\mathcal{O}(1/k^\rho)$ for some $\rho\in(0,1]$ that depends only on the choice of kernel function (i.e., not on the problem's primitives). These theoretical results are validated by numerical experiments in standard non-convex test functions and large-scale traffic assignment problems.


Learning to Evolve

arXiv.org Machine Learning

Evolution and learning are two of the fundamental mechanisms by which life adapts in order to survive and to transcend limitations. These biological phenomena inspired successful computational methods such as evolutionary algorithms and deep learning. Evolution relies on random mutations and on random genetic recombination. Here we show that learning to evolve, i.e. learning to mutate and recombine better than at random, improves the result of evolution in terms of fitness increase per generation and even in terms of attainable fitness. We use deep reinforcement learning to learn to dynamically adjust the strategy of evolutionary algorithms to varying circumstances. Our methods outperform classical evolutionary algorithms on combinatorial and continuous optimization problems.


Smoothing Policies and Safe Policy Gradients

arXiv.org Machine Learning

Policy gradient algorithms are among the best candidates for the much anticipated application of reinforcement learning to real-world control tasks, such as the ones arising in robotics. However, the trial-and-error nature of these methods introduces safety issues whenever the learning phase itself must be performed on a physical system. In this paper, we address a specific safety formulation, where danger is encoded in the reward signal and the learning agent is constrained to never worsen its performance. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows to identify those meta-parameter schedules that guarantee monotonic improvement with high probability. The two key meta-parameters are the step size of the parameter updates and the batch size of the gradient estimators. By a joint, adaptive selection of these meta-parameters, we obtain a safe policy gradient algorithm.


Meta-learning of Sequential Strategies

arXiv.org Machine Learning

In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memory-based meta-learning within a Bayesian framework, showing that the meta-learned strategies are near-optimal because they amortize Bayes-filtered data, where the adaptation is implemented in the memory dynamics as a state-machine of sufficient statistics. Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem.


Bayesian Optimization using Deep Gaussian Processes

arXiv.org Machine Learning

Bayesian Optimization using Gaussian Processes is a popular approach to deal with the optimization of expensive black-box functions. However, because of the a priori on the stationarity of the covariance matrix of classic Gaussian Processes, this method may not be adapted for non-stationary functions involved in the optimization problem. To overcome this issue, a new Bayesian Optimization approach is proposed. It is based on Deep Gaussian Processes as surrogate models instead of classic Gaussian Processes. This modeling technique increases the power of representation to capture the non-stationarity by simply considering a functional composition of stationary Gaussian Processes, providing a multiple layer structure. This paper proposes a new algorithm for Global Optimization by coupling Deep Gaussian Processes and Bayesian Optimization. The specificities of this optimization method are discussed and highlighted with academic test cases. The performance of the proposed algorithm is assessed on analytical test cases and an aerospace design optimization problem and compared to the state-of-the-art stationary and non-stationary Bayesian Optimization approaches.


Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review

arXiv.org Machine Learning

Pattern analysis often requires a pre-processing stage for extracting or selecting features in order to help the classification, prediction, or clustering stage discriminate or represent the data in a better way. The reason for this requirement is that the raw data are complex and difficult to process without extracting or selecting appropriate features beforehand. This paper reviews theory and motivation of different common methods of feature selection and extraction and introduces some of their applications. Some numerical implementations are also shown for these methods. Finally, the methods in feature selection and extraction are compared.


Collaborative and Privacy-Preserving Machine Teaching via Consensus Optimization

arXiv.org Machine Learning

In this work, we define a collaborative and privacy-preserving machine teaching paradigm with multiple distributed teachers. We focus on consensus super teaching. It aims at organizing distributed teachers to jointly select a compact while informative training subset from data hosted by the teachers to make a learner learn better. The challenges arise from three perspectives. First, the state-of-the-art pool-based super teaching method applies mixed-integer non-linear programming (MINLP) which does not scale well to very large data sets. Second, it is desirable to restrict data access of the teachers to only their own data during the collaboration stage to mitigate privacy leaks. Finally, the teaching collaboration should be communication-efficient since large communication overheads can cause synchronization delays between teachers. To address these challenges, we formulate collaborative teaching as a consensus and privacy-preserving optimization process to minimize teaching risk. We theoretically demonstrate the necessity of collaboration between teachers for improving the learner's learning. Furthermore, we show that the proposed method enjoys a similar property as the Oracle property of adaptive Lasso. The empirical study illustrates that our teaching method can deliver significantly more accurate teaching results with high speed, while the non-collaborative MINLP-based super teaching becomes prohibitively expensive to compute.


Design Space Exploration via Answer Set Programming Modulo Theories

arXiv.org Artificial Intelligence

The design of embedded systems, that are ubiquitously used in mobile devices and cars, is becoming continuously more complex such that efficient system-level design methods are becoming crucial. My research aims at developing systems that help the designer express the complex design problem in a declarative way and explore the design space to obtain divers sets of solutions with desirable properties. To that end, we employ knowledge representation and reasoning capabilities of ASP in combination with background theories. As a result, for the first time, we proposed a sophisticated methodology that allows for the direct integration of multi-objective optimization of non-linear objectives into ASP. This includes unique results of diverse sub-problems covered in several publications which I will present in this work.


Estimate Sequences for Variance-Reduced Stochastic Composite Optimization

arXiv.org Machine Learning

While the finite-sum setting is a particular case of expectation, the deterministic nature of the resulting cost function In this paper, we propose a unified view of drastically changes the performance guarantees an optimization gradient-based algorithms for stochastic convex method may achieve to solve (1). In particular, when an composite optimization by extending the concept algorithm is only allowed to access unbiased measurements of estimate sequence introduced by Nesterov. of the objective and gradient, it may be shown that the worstcase This point of view covers the stochastic gradient convergence rate in expected function value cannot be descent method, variants of the approaches better than O(1/k) in general, where k is the number of SAGA, SVRG, and has several advantages: (i) iterations (Nemirovski et al., 2009; Agarwal et al., 2012).