Optimization
Matheuristics to optimize maintenance scheduling and refueling of nuclear power plants
Dupin, Nicolas, Talbi, El-Ghazali
Scheduling the maintenances of nuclear power plants is a complex optimization problem, formulated in 2-stage stochastic programming for the EURO/ROADEF 2010 challenge. The first level optimizes the maintenance dates and refueling decisions. The second level optimizes the production to fulfill the power demands and to ensure feasibility and costs of the first stage decisions. This paper solves a deterministic version of the problem, studying Mixed Integer Programming (MIP) formulations and matheuristics. Relaxing only two sets of constraints of the ROADEF challenge, a MIP formulation can be written using only binary variables for the maintenance dates. The MIP formulations are used to design constructive matheuristics and a Variable Neighborhood Descent (VND) local search. These matheuristics produce very high quality solutions. Some intermediate results explains results of the Challenge: the relaxation of constraints CT6 are justified and neighborhood analyses with MIP-VND justifies the choice of neighborhoods to implement for the problem. Lastly, an extension with stability costs for monthly reoptimization is considered, with efficient bi-objective matheuristics.
Doubly Bayesian Optimization
Bayesian optimization (BO) is a powerful method for optimizing complex black-box functions that are costly to evaluate directly. Although useful out of the box, complexities arise when the domain exhibits non-smooth structure, noise, or greater than five dimensions. Extending BO for these issues is non-trivial, which is why we suggest casting BO methods into the probabilistic programming paradigm. These systems (PPS) enable users to encode model structure and naturally reason about uncertainties, which can be leveraged towards improved BO methods. Here we present a probabilistic domain-specific language where BO is native, showing the Bayesian approach to optimization is more naturally expressed in a PPS, and better equipped to address the above issues. We validate the approach on standard optimization benchmarks, while demonstrating the utility of programmable structure to address the inner-optimization problem of BO. Importantly, we also show that the framework enables the user to more readily use advanced techniques such as unscented BO and noisy expected improvement.
Massively scalable Sinkhorn distances via the Nystr\"om method
Altschuler, Jason, Bach, Francis, Rudi, Alessandro, Weed, Jonathan
The Sinkhorn distance, a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference. We give a simple, practical, parallelizable algorithm NYS-SINK, based on Nystr\"om approximation, for computing Sinkhorn distances on a massive scale. As we show in numerical experiments, our algorithm easily computes Sinkhorn distances on data sets hundreds of times larger than can be handled by state-of-the-art approaches. We also give provable guarantees establishing that the running time and memory requirements of our algorithm adapt to the intrinsic dimension of the underlying data.
MetaStyle: Three-Way Trade-Off Among Speed, Flexibility, and Quality in Neural Style Transfer
Zhang, Chi, Zhu, Yixin, Zhu, Song-Chun
An unprecedented booming has been witnessed in the research area of artistic style transfer ever since Gatys et al. introduced the neural method. One of the remaining challenges is to balance a trade-off among three critical aspects---speed, flexibility, and quality: (i) the vanilla optimization-based algorithm produces impressive results for arbitrary styles, but is unsatisfyingly slow due to its iterative nature, (ii) the fast approximation methods based on feed-forward neural networks generate satisfactory artistic effects but bound to only a limited number of styles, and (iii) feature-matching methods like AdaIN achieve arbitrary style transfer in a real-time manner but at a cost of the compromised quality. We find it considerably difficult to balance the trade-off well merely using a single feed-forward step and ask, instead, whether there exists an algorithm that could adapt quickly to any style, while the adapted model maintains high efficiency and good image quality. Motivated by this idea, we propose a novel method, coined MetaStyle, which formulates the neural style transfer as a bilevel optimization problem and combines learning with only a few post-processing update steps to adapt to a fast approximation model with satisfying artistic effects, comparable to the optimization-based methods for an arbitrary style. The qualitative and quantitative analysis in the experiments demonstrates that the proposed approach achieves high-quality arbitrary artistic style transfer effectively, with a good trade-off among speed, flexibility, and quality.
Online Newton Step Algorithm with Estimated Gradient
Liu, Binbin, Li, Jundong, Song, Yunquan, Liang, Xijun, Jian, Ling, Liu, Huan
They have shown to be effective in handling large-scale and high-velocity streaming data and emerged to become popular in the big data era Hoi et al. [2018, 2014]. In recent years, a number of effective online learning algorithms have been investigated and applied in a variety of high impact domains,ranging from game theory, information theory to machine learning and data mining Ding et al. [2017], Shalev-Shwartz [2011], Wang et al. [2003]. Most previously proposed online learning algorithms fall into the wellestablished frameworkof online convex optimization Gordon [1999], Zinkevich [2003]. In terms of the optimization algorithms, online learning algorithms can be grouped into the following categories: (i) first-order algorithms which aim to optimize the objective function using the first-order (sub) gradient such as the well-known OGD algorithm Zinkevich [2003]; and (ii) second-order algorithms which aim to exploit second-order information to speed up the convergence of the optimization, such as the ONS algorithm Hazan et al. [2007]. In online convex optimization, previous approaches are mainly based on the first-order optimization, i.e., optimization using the first-order derivative of the cost function. Theregret bound achieved by these algorithms is proportional to the polynomial of the number of rounds T . For example, Zinkevich [2003] showed that with the simple OGD, we can achieve the regret bound of O( T). Later on, Hazan et al. [2007] introduced a new algorithm with ONS by exploiting the second-order derivative of the cost function, which can be viewed as an online 2 analogy of the Newton-Raphson method Ypma and Tjalling [1995] in the offline learning.Although the time complexity O(d
On the Curved Geometry of Accelerated Optimization
In this work we propose a differential geometric motivation for Nesterov's accelerated gradient method (AGM) for strongly-convex problems. By considering the optimization procedure as occurring on a Riemannian manifold with a natural structure, The AGM method can be seen as the proximal point method applied in this curved space. This viewpoint can also be extended to the continuous time case, where the accelerated gradient method arises from the natural block-implicit Euler discretization of an ODE on the manifold. We provide an analysis of the convergence rate of this ODE for quadratic objectives.
Why Does Stagewise Training Accelerate Convergence of Testing Error Over SGD?
Yang, Tianbao, Yan, Yan, Yuan, Zhuoning, Jin, Rong
Stagewise training strategy is commonly used for learning neural networks, which uses a stochastic algorithm (e.g., SGD) starting with a relatively large step size (aka learning rate) and geometrically decreasing the step size after a number of iterations. It has been observed that the stagewise SGD has much faster convergence than the vanilla SGD with a polynomial decaying step size in terms of both training error and testing error. {\it But how to explain this phenomenon has been largely ignored by existing studies.} This paper provides some theoretical evidence for explaining this faster convergence. In particular, we consider the stagewise training strategy for minimizing empirical risk that satisfies the Polyak-\L ojasiewicz condition, which has been observed/proved for neural networks and also holds for a broad family of convex functions. For convex loss functions and "nice-behaviored" non-convex loss functions that are close to a convex function (namely weakly convex functions), we establish faster convergence of stagewise training than the vanilla SGD under the same condition on both training error and testing error. Indeed, the proposed algorithm has additional favorable features that come with theoretical guarantee for the considered non-convex optimization problems, including using explicit algorithmic regularization at each stage, using stagewise averaged solution for restarting, and returning the last stagewise averaged solution as the final solution. To differentiate from commonly used stagewise SGD, we refer to our algorithm as stagewise regularized training algorithm. Of independent interest, the proved testing error bounds for a family of non-convex loss functions are dimensionality and norm independent.
Investigating performance of neural networks and gradient boosting models approximating microscopic traffic simulations in traffic optimization tasks
Gora, Paweล, Brzeski, Maciej, Moลผejko, Marcin, Klemenko, Arkadiusz, Kochaลski, Adrian
We analyze the accuracy of traffic simulations metamodels based on neural networks and gradient boosting models (LightGBM), applied to traffic optimization as fitness functions of genetic algorithms. Our metamodels approximate outcomes of traffic simulations (the total time of waiting on a red signal) taking as an input different traffic signal settings, in order to efficiently find (sub)optimal settings. Their accuracy was proven to be very good on randomly selected test sets, but it turned out that the accuracy may drop in case of settings expected (according to genetic algorithms) to be close to local optima, which makes the traffic optimization process more difficult. In this work, we investigate 16 different metamodels and 20 settings of genetic algorithms, in order to understand what are the reasons of this phenomenon, what is its scale, how it can be mitigated and what can be potentially done to design better real-time traffic optimization methods.
KF-LAX: Kronecker-factored curvature estimation for control variate optimization in reinforcement learning
A key challenge for gradient based optimization methods in model-free reinforcement learning is to develop an approach that is sample efficient and has low variance. In this work, we apply Kronecker-factored curvature estimation technique (KFAC) to a recently proposed gradient estimator for control variate optimization, RELAX, to increase the sample efficiency of using this gradient estimation method in reinforcement learning. The performance of the proposed method is demonstrated on a synthetic problem and a set of three discrete control task Atari games.
Learning Controllable Fair Representations
Song, Jiaming, Kalluri, Pratyusha, Grover, Aditya, Zhao, Shengjia, Ermon, Stefano
Learning data representations that are transferable and fair with respect to certain protected attributes is crucial to reducing unfair decisions made downstream, while preserving the utility of the data. We propose an information-theoretically motivated objective for learning maximally expressive representations subject to fairness constraints. We demonstrate that a range of existing approaches optimize approximations to the Lagrangian dual of our objective. In contrast to these existing approaches, our objective provides the user control over the fairness of representations by specifying limits on unfairness. We introduce a dual optimization method that optimizes the model as well as the expressiveness-fairness trade-off. Empirical evidence suggests that our proposed method can account for multiple notions of fairness and achieves higher expressiveness at a lower computational cost.