Optimization
An Integrated Optimization + Learning Approach to Optimal Dynamic Pricing for the Retailer with Multi-type Customers in Smart Grids
Meng, Fanlin, Zeng, Xiao-Jun, Zhang, Yan, Dent, Chris J., Gong, Dunwei
In this paper, we consider a realistic and meaningful scenario in the context of smart grids where an electricity retailer serves three different types of customers, i.e., customers with an optimal home energy management system embedded in their smart meters (C-HEMS), customers with only smart meters (C-SM), and customers without smart meters (C-NONE). The main objective of this paper is to support the retailer to make optimal day-ahead dynamic pricing decisions in such a mixed customer pool. To this end, we propose a two-level decision-making framework where the retailer acting as upper-level agent firstly announces its electricity prices of next 24 hours and customers acting as lower-level agents subsequently schedule their energy usages accordingly. For the upper level problem, we optimize the dynamic prices for the retailer to maximize its profit subject to realistic market constraints. The above two-level model is tackled by genetic algorithms (GA) based distributed optimization methods while its feasibility and effectiveness are con-2018. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/. Please cite this accepted article as: Fanlin Meng, Xiao-Jun Zeng, Yan Zhang, Chris J. Dent, Dunwei Gong, An Integrated Optimization Learning Approach to Optimal Dynamic Pricing for the Retailer with Multi-type Customers in Smart Grids, Information Sciences (2018), doi: 10.1016/j.ins.2018.03.039 Preprint submitted to Information Sciences March 22, 2018 firmed via simulation results. Keywords: Bilevel Modelling, Genetic Algorithms, Machine Learning, Dynamic Pricing, Demand-side Management, Demand Response, Smart Grids 1. Introduction With the large-scale deployment of smart meters and two-way communication infrastructures, dynamic pricing based demand response and demand-side management programs [37] [12] have attracted enormous attentions from both academia and industry and are expected to bring great benefits to the whole power system. Real-time pricing (RTP), timeof-use pricing (ToU) and critical-peak pricing (CPP) are commonly used dynamic pricing strategies [20].
MLtuner: System Support for Automatic Machine Learning Tuning
Cui, Henggang, Ganger, Gregory R., Gibbons, Phillip B.
MLtuner automatically tunes settings for training tunables (such as the learning rate, the momentum, the mini-batch size, and the data staleness bound) that have a significant impact on large-scale machine learning (ML) performance. Traditionally, these tunables are set manually, which is unsurprisingly error-prone and difficult to do without extensive domain knowledge. MLtuner uses efficient snapshotting, branching, and optimization-guided online trial-and-error to find good initial settings as well as to re-tune settings during execution. Experiments show that MLtuner can robustly find and re-tune tunable settings for a variety of ML applications, including image classification (for 3 models and 2 datasets), video classification, and matrix factorization. Compared to state-of-the-art ML auto-tuning approaches, MLtuner is more robust for large problems and over an order of magnitude faster.
Sparse Reduced Rank Regression With Nonconvex Regularization
Zhao, Ziping, Palomar, Daniel P.
In this paper, the estimation problem for sparse reduced rank regression (SRRR) model is considered. The SRRR model is widely used for dimension reduction and variable selection with applications in signal processing, econometrics, etc. The problem is formulated to minimize the least squares loss with a sparsity-inducing penalty considering an orthogonality constraint. Convex sparsity-inducing functions have been used for SRRR in literature. In this work, a nonconvex function is proposed for better sparsity inducing. An efficient algorithm is developed based on the alternating minimization (or projection) method to solve the nonconvex optimization problem. Numerical simulations show that the proposed algorithm is much more efficient compared to the benchmark methods and the nonconvex function can result in a better estimation accuracy.
Learning Optimal Control of Synchronization in Networks of Coupled Oscillators using Genetic Programming-based Symbolic Regression
Gout, Julien, Quade, Markus, Shafi, Kamran, Niven, Robert K., Abel, Markus
Networks of coupled dynamical systems provide a powerful way to model systems with enormously complex dynamics, such as the human brain. Control of synchronization in such networked systems has far reaching applications in many domains, including engineering and medicine. In this paper, we formulate the synchronization control in dynamical systems as an optimization problem and present a multi-objective genetic programming-based approach to infer optimal control functions that drive the system from a synchronized to a non-synchronized state and vice-versa. The genetic programming-based controller allows learning optimal control functions in an interpretable symbolic form. The effectiveness of the proposed approach is demonstrated in controlling synchronization in coupled oscillator systems linked in networks of increasing order complexity, ranging from a simple coupled oscillator system to a hierarchical network of coupled oscillators. The results show that the proposed method can learn highly-effective and interpretable control functions for such systems.
Hidden Integrality of SDP Relaxation for Sub-Gaussian Mixture Models
We consider the problem of estimating the discrete clustering structures under Sub-Gaussian Mixture Models. Our main results establish a hidden integrality property of a semidefinite programming (SDP) relaxation for this problem: while the optimal solutions to the SDP are not integer-valued in general, their estimation errors can be upper bounded in terms of the error of an idealized integer program. The error of the integer program, and hence that of the SDP, are further shown to decay exponentially in the signal-to-noise ratio. To the best of our knowledge, this is the first exponentially decaying error bound for convex relaxations of mixture models, and our results reveal the "global-to-local" mechanism that drives the performance of the SDP relaxation. A corollary of our results shows that in certain regimes the SDP solutions are in fact integral and exact, improving on existing exact recovery results for convex relaxations. More generally, our results establish sufficient conditions for the SDP to correctly recover the cluster memberships of $(1-\delta)$ fraction of the points for any $\delta\in(0,1)$. As a special case, we show that under the $d$-dimensional Stochastic Ball Model, SDP achieves non-trivial (sometimes exact) recovery when the center separation is as small as $\sqrt{1/d}$, which complements previous exact recovery results that require constant separation.
A Multi-Scheme Ensemble Using Coopetitive Soft-Gating With Application to Power Forecasting for Renewable Energy Generation
Gensler, Andrรฉ, Sick, Bernhard
In this article, we propose a novel ensemble technique with a multi-scheme weighting based on a technique called coopetitive soft gating. This technique combines both, ensemble member competition and cooperation, in order to maximize the overall forecasting accuracy of the ensemble. The proposed algorithm combines the ideas of multiple ensemble paradigms (power forecasting model ensemble, weather forecasting model ensemble, and lagged ensemble) in a hierarchical structure. The technique is designed to be used in a flexible manner on single and multiple weather forecasting models, and for a variety of lead times. We compare the technique to other power forecasting models and ensemble techniques with a flexible number of weather forecasting models, which can have the same, or varying forecasting horizons. It is shown that the model is able to outperform those models on a number of publicly available data sets. The article closes with a discussion of properties of the proposed model which are relevant in its application. Keywords: Ensemble techniques, Power forecasting, Multi model ensembles, Combining forecasts, Model selection, Time series, Data mining 1. Introduction During the past decade, there has been a tremendous growth of the installed capacity of various forms of renewable energy generation. Wind turbines and photovoltaic powerplants contribute substantially to the new mix of energy, which consists of both nonrenewable and renewable energy power plants. Most renewable energy sources have intermittent generation characteristics, i.e., the amount of generated power highly depends on the weather situation and it cannot be regulated the way it is possible with traditional power plants. In order to guarantee grid stability, the power generation and load in the grid have to be balanced, as the intermediate storage of electrical energy is both inefficient and expensive. Intelligent Embedded Systems Homepage: http://www. Depending on the forecasting horizon, the forecast is of interest to different actors in the field, e.g., network operators, power plant operators, or electricity traders. Having an accurate power forecast, the technical and financial risks for all market participants can be reduced. The power forecasting process typically takes place in two steps: 1. A meteorological forecast for the desired area (the location of the renewable energy power plant) is computed. This forecast is called numerical weather prediction (NWP). In this article, we focus on the second step of the forecasting process, i.e., we assume the NWP as given.
Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning
Schmitz, Morgan A., Heitz, Matthieu, Bonneel, Nicolas, Mboula, Fred Maurice Ngolรจ, Coeurjolly, David, Cuturi, Marco, Peyrรฉ, Gabriel, Starck, Jean-Luc
This paper introduces a new nonlinear dictionary learning method for histograms in the probability simplex. The method leverages optimal transport theory, in the sense that our aim is to reconstruct histograms using so-called displacement interpolations (a.k.a. Wasserstein barycenters) between dictionary atoms; such atoms are themselves synthetic histograms in the probability simplex. Our method simultaneously estimates such atoms, and, for each datapoint, the vector of weights that can optimally reconstruct it as an optimal transport barycenter of such atoms. Our method is computationally tractable thanks to the addition of an entropic regularization to the usual optimal transportation problem, leading to an approximation scheme that is efficient, parallel and simple to differentiate. Both atoms and weights are learned using a gradient-based descent method. Gradients are obtained by automatic differentiation of the generalized Sinkhorn iterations that yield barycenters with entropic smoothing. Because of its formulation relying on Wasserstein barycenters instead of the usual matrix product between dictionary and codes, our method allows for nonlinear relationships between atoms and the reconstruction of input data. We illustrate its application in several different image processing settings.
A Study of Car-to-Train Assignment Problem for Rail Express Cargos on Scheduled and Unscheduled Train Service Network
Freight train services in a railway network system are generally divided into two categories: one is the unscheduled train, whose operating frequency fluctuates with origin-destination (OD) demands; the other is the scheduled train, which is running based on regular timetable just like the passenger trains. The timetable will be released to the public if determined and it would not be influenced by OD demands. Typically, the total capacity of scheduled trains can usually satisfy the predicted demands of express cargos in average. However, the demands are changing in practice. Therefore, how to distribute the shipments between different stations to unscheduled and scheduled train services has become an important research field in railway transportation. This paper focuses on the coordinated optimization of the rail express cargos distribution in two service networks. On the premise of fully utilizing the capacity of scheduled service network first, we established a Car-to-Train (CTT) assignment model to assign rail express cargos to scheduled and unscheduled trains scientifically. The objective function is to maximize the net income of transporting the rail express cargos. The constraints include the capacity restriction on the service arcs, flow balance constraints, logical relationship constraint between two groups of decision variables and the due date constraint. The last constraint is to ensure that the total transportation time of a shipment would not be longer than its predefined due date. Finally, we discuss the linearization techniques to simplify the model proposed in this paper, which make it possible for obtaining global optimal solution by using the commercial software.
Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning
Karakus, Can, Sun, Yifan, Diggavi, Suhas, Yin, Wotao
Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is "encoded" to have an over-complete representation with built-in redundancy, and the straggling nodes in the system are dynamically left out of the computation at every iteration, whose loss is compensated by the embedded redundancy. We show that oblivious application of several popular optimization algorithms on encoded data, including gradient descent, L-BFGS, proximal gradient under data parallelism, and coordinate descent under model parallelism, converge to either approximate or exact solutions of the original problem when stragglers are treated as erasures. These convergence results are deterministic, i.e., they establish sample path convergence for arbitrary sequences of delay patterns or distributions on the nodes, and are independent of the tail behavior of the delay distribution. We demonstrate that equiangular tight frames have desirable properties as encoding matrices, and propose efficient mechanisms for encoding large-scale data. We implement the proposed technique on Amazon EC2 clusters, and demonstrate its performance over several learning problems, including matrix factorization, LASSO, ridge regression and logistic regression, and compare the proposed method with uncoded, asynchronous, and data replication strategies.
Domain Adaptation on Graphs by Learning Aligned Graph Bases
We propose a method for domain adaptation on graphs. Given sufficiently many observations of the label function on a source graph, we study the problem of transferring the label information from the source graph to a target graph for estimating the target label function. Our assumption about the relation between the two domains is that the frequency content of the label function, regarded as a graph signal, has similar characteristics over the source and the target graphs. We propose a method to learn a pair of coherent bases on the two graphs, such that the corresponding source and target graph basis vectors have similar spectral content, while "aligning" the two graphs at the same time so that the reconstructed source and target label functions have similar coefficients over the bases. Experiments on several types of data sets suggest that the proposed method compares quite favorably to reference domain adaptation methods. To the best of our knowledge, our treatment is the first to study the domain adaptation problem in a purely graph-based setting with no need for embedding the data in an ambient space. This feature is particularly convenient for many problems of interest concerning learning on graphs or networks.