Goto

Collaborating Authors

 Optimization


Obtaining Adjustable Regularization for Free via Iterate Averaging

arXiv.org Machine Learning

Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regression problems, then by averaging the SGD iterates properly, we obtain a regularized solution. It left open whether the same phenomenon can be achieved for other optimization problems and algorithms. In this paper, we establish an averaging scheme that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter. Our approaches can be used for accelerated and preconditioned optimization methods as well. We further show that the same methods work empirically on more general optimization objectives including neural networks. In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco.


Enhanced data efficiency using deep neural networks and Gaussian processes for aerodynamic design optimization

arXiv.org Machine Learning

Adjoint-based optimization methods are attractive for aerodynamic shape design primarily due to their computational costs being independent of the dimensionality of the input space and their ability to generate high-fidelity gradients that can then be used in a gradient-based optimizer. This makes them very well suited for high-fidelity simulation based aerodynamic shape optimization of highly parametrized geometries such as aircraft wings. However, the development of adjoint-based solvers involve careful mathematical treatment and their implementation require detailed software development. Furthermore, they can become prohibitively expensive when multiple optimization problems are being solved, each requiring multiple restarts to circumvent local optima. In this work, we propose a machine learning enabled, surrogate-based framework that replaces the expensive adjoint solver, without compromising on predicting predictive accuracy. Specifically, we first train a deep neural network (DNN) from training data generated from evaluating the high-fidelity simulation model on a model-agnostic, design of experiments on the geometry shape parameters. The optimum shape may then be computed by using a gradient-based optimizer coupled with the trained DNN. Subsequently, we also perform a gradient-free Bayesian optimization, where the trained DNN is used as the prior mean. We observe that the latter framework (DNN-BO) improves upon the DNN-only based optimization strategy for the same computational cost. Overall, this framework predicts the true optimum with very high accuracy, while requiring far fewer high-fidelity function calls compared to the adjoint-based method. Furthermore, we show that multiple optimization problems can be solved with the same machine learning model with high accuracy, to amortize the offline costs associated with constructing our models.


Intelligent Service Selection in a Multi-dimensional Environment of Cloud Providers for IoT stream Data through cloudlets

arXiv.org Artificial Intelligence

The expansion of the Internet of Things(IoT) services and a huge amount of data generated by different sensors, signify the importance of cloud computing services like Storage as a Service more than ever. IoT traffic imposes such extra constraints on the cloud storage service as sensor data preprocessing capability and load-balancing between data centers and servers in each data center. Also, it should be allegiant to the Quality of Service (QoS). The hybrid MWG algorithm has been proposed in this work, which considers different objectives such as energy, processing time, transmission time, and load balancing in both Fog and Cloud Layer. The MATLAB script is used to simulate and implement our algorithms, and services of different servers, e.g. Amazon, Dropbox, Google Drive, etc. have been considered. The MWG has 7%, 13%, and 25% improvement in comparison with MOWCA, KGA, and NSGAII in metric of spacing, respectively. Moreover, the MWG has 4%, 4.7%, and 7.3% optimization in metric of quality in comparison to MOWCA, KGA, and NSGAII, respectively. The overall optimization shows that the MWG algorithm has 7.8%, 17%, and 21.6% better performance in comparison with MOWCA, KGA, and NSGAII in the obtained best result by considering different objectives, respectively.


Efficient hyperparameter optimization by way of PAC-Bayes bound minimization

arXiv.org Machine Learning

Identifying optimal values for a high-dimensional set of hyperparameters is a problem that has received growing attention given its importance to large-scale machine learning applications such as neural architecture search. Recently developed optimization methods can be used to select thousands or even millions of hyperparameters. Such methods often yield overfit models, however, leading to poor performance on unseen data. We argue that this overfitting results from using the standard hyperparameter optimization objective function. Here we present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error. We then devise an efficient gradient-based algorithm to minimize this objective; the proposed method has asymptotic space and time complexity equal to or better than other gradient-based hyperparameter optimization methods. We show that this new method significantly reduces out-of-sample error when applied to hyperparameter optimization problems known to be prone to overfitting.


Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions

arXiv.org Machine Learning

This paper seeks to establish a framework for directing a society of simple, specialized, self-interested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of non-cooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.


Maximum Customers' Satisfaction in One-way Car-sharing: Modeling, Exact and Heuristic Solving

arXiv.org Artificial Intelligence

One-way car-sharing systems are transportation systems that allow customers to rent cars at stations scattered around the city, use them for a short journey, and return them at any station. The maximum customers' satisfaction problem concerns the task of assigning the cars, initially located at given stations, to maximize the number of satisfied customers. We consider the problem with two stations where each customer has exactly two demands in opposite directions between both stations, and a customer is satisfied only if both their demands are fulfilled. For solving this problem, we propose mixed-integer programming (MIP) models and matheuristics based on local search. We created a benchmark of instances used to test the exact and heuristic approaches. Additionally, we proposed a preprocessing procedure to reduce the size of the instance. Our MIP models can solve to optimality 85% of the proposed instances with 1000 customers in 10 minutes, with an average gap smaller than 0.1% for all these instances. For larger instances (2500 and 5000 customers), except for some particular cases, they presented an average gap smaller than 0.8%. Also, our local-based matheuristics presented small average gaps which are better than the MIP models in some larger instances.


Variance Regularization for Accelerating Stochastic Optimization

arXiv.org Machine Learning

While nowadays most gradient-based optimization methods focus on exploring the high-dimensional geometric features, the random error accumulated in a stochastic version of any algorithm implementation has not been stressed yet. In this work, we propose a universal principle which reduces the random error accumulation by exploiting statistic information hidden in mini-batch gradients. This is achieved by regularizing the learning-rate according to mini-batch variances. Due to the complementarity of our perspective, this regularization could provide a further improvement for stochastic implementation of generic 1st order approaches. With empirical results, we demonstrated the variance regularization could speed up the convergence as well as stabilize the stochastic optimization.


Small Towers Make Big Differences

arXiv.org Machine Learning

Multi-task learning aims at solving multiple machine learning tasks at the same time. A good solution to a multi-task learning problem should be generalizable in addition to being Pareto optimal. In this paper, we provide some insights on understanding the trade-off between Pareto efficiency and generalization as a result of parameterization in multi-task deep learning models. As a multi-objective optimization problem, enough parameterization is needed for handling task conflicts in a constrained solution space; however, from a multi-task generalization perspective, over-parameterization undermines the benefit of learning a shared representation which helps harder tasks or tasks with limited training examples. A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization. To this end, we propose a method of under-parameterized self-auxiliaries for multi-task models to achieve the best of both worlds. It is task-agnostic and works with other multi-task learning algorithms. Empirical results show that small towers of under-parameterized self-auxiliaries can make big differences in improving Pareto efficiency in various multi-task applications.


Optimal to-do list gamification

arXiv.org Artificial Intelligence

What should I work on first? What can wait until later? Which projects should I prioritize and which tasks are not worth my time? These are challenging questions that many people face every day. People's intuitive strategy is to prioritize their immediate experience over the long-term consequences. This leads to procrastination and the neglect of important long-term projects in favor of seemingly urgent tasks that are less important. Optimal gamification strives to help people overcome these problems by incentivizing each task by a number of points that communicates how valuable it is in the long-run. Unfortunately, computing the optimal number of points with standard dynamic programming methods quickly becomes intractable as the number of a person's projects and the number of tasks required by each project increase. Here, we introduce and evaluate a scalable method for identifying which tasks are most important in the long run and incentivizing each task according to its long-term value. Our method makes it possible to create to-do list gamification apps that can handle the size and complexity of people's to-do lists in the real world.


Optimizing fire allocation in a NCW-type model

arXiv.org Artificial Intelligence

In this paper, we introduce a non-linear Lanchester model of NCW-type and investigate an optimization problem for this model, where only the Red force is supplied by several supply agents. Optimal fire allocation of the Blue force is sought in the form of a piece-wise constant function of time. A threatening rate is computed for the Red force and each of its supply agents at the beginning of each stage of the combat. These rates can be used to derive the optimal decision for the Blue force to focus its firepower to the Red force itself or one of its supply agents. This optimal fire allocation is derived and proved by considering an optimization problem of number of Blue force troops. Numerical experiments are included to demonstrate the theoretical results.