Energy
Bayesian Optimization of Composite Functions
Astudillo, Raul, Frazier, Peter I.
We consider optimization of composite objective functions, i.e., of the form $f(x)=g(h(x))$, where $h$ is a black-box derivative-free expensive-to-evaluate function with vector-valued outputs, and $g$ is a cheap-to-evaluate real-valued function. While these problems can be solved with standard Bayesian optimization, we propose a novel approach that exploits the composite structure of the objective function to substantially improve sampling efficiency. Our approach models $h$ using a multi-output Gaussian process and chooses where to sample using the expected improvement evaluated on the implied non-Gaussian posterior on $f$, which we call expected improvement for composite functions (\ei). Although \ei\ cannot be computed in closed form, we provide a novel stochastic gradient estimator that allows its efficient maximization. We also show that our approach is asymptotically consistent, i.e., that it recovers a globally optimal solution as sampling effort grows to infinity, generalizing previous convergence results for classical expected improvement. Numerical experiments show that our approach dramatically outperforms standard Bayesian optimization benchmarks, reducing simple regret by several orders of magnitude.
Robust exploration in linear quadratic reinforcement learning
Umenberger, Jack, Ferizbegovic, Mina, Schön, Thomas B., Hjalmarsson, Håkan
This paper concerns the problem of learning control policies for an unknown linear dynamical system to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task robustly: i.e., we minimize the worst-case cost, accounting for system uncertainty given the observed data. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.
Nemesyst: A Hybrid Parallelism Deep Learning-Based Framework Applied for Internet of Things Enabled Food Retailing Refrigeration Systems
Onoufriou, George, Bickerton, Ronald, Pearson, Simon, Leontidis, Georgios
Deep Learning has attracted considerable attention across multiple application domains, including computer vision, signal processing and natural language processing. Although quite a few single node deep learning frameworks exist, such as tensorflow, pytorch and keras, we still lack a complete processing structure that can accommodate large scale data processing, version control, and deployment, all while staying agnostic of any specific single node framework. To bridge this gap, this paper proposes a new, higher level framework, i.e. Nemesyst, which uses databases along with model sequentialisation to allow processes to be fed unique and transformed data at the point of need. This facilitates near real-time application and makes models available for further training or use at any node that has access to the database simultaneously. Nemesyst is well suited as an application framework for internet of things aggregated control systems, deploying deep learning techniques to optimise individual machines in massive networks. To demonstrate this framework, we adopted a case study in a novel domain; deploying deep learning to optimise the high speed control of electrical power consumed by a massive internet of things network of retail refrigeration systems in proportion to load available on the UK National Grid (a demand side response). The case study demonstrated for the first time in such a setting how deep learning models, such as Recurrent Neural Networks (vanilla and Long-Short-Term Memory) and Generative Adversarial Networks paired with Nemesyst, achieve compelling performance, whilst still being malleable to future adjustments as both the data and requirements inevitably change over time.
Reliable training and estimation of variance networks
Detlefsen, Nicki S., Jørgensen, Martin, Hauberg, Søren
We propose and investigate new complementary methodologies for estimating predictive variance networks in regression neural networks. We derive a locally aware mini-batching scheme that result in sparse robust gradients, and show how to make unbiased weight updates to a variance network. Further, we formulate a heuristic for robustly fitting both the mean and variance networks post hoc. Finally, we take inspiration from posterior Gaussian processes and propose a network architecture with similar extrapolation properties to Gaussian processes. The proposed methodologies are complementary, and improve upon baseline methods individually. Experimentally, we investigate the impact on predictive uncertainty on multiple datasets and tasks ranging from regression, active learning and generative modeling. Experiments consistently show significant improvements in predictive uncertainty estimation over state-of-the-art methods across tasks and datasets.
Unsupervised Temporal Clustering to Monitor the Performance of Alternative Fueling Infrastructure
Zero Emission Vehicles (ZEV) play an important role in the decarbonization of the transportation sector. For a wider adoption of ZEVs, providing a reliable infrastructure is critical. We present a machine learning approach that uses unsupervised temporal clustering algorithm along with survey analysis to determine infrastructure performance and reliability of alternative fuels. We illustrate this approach for the hydrogen fueling stations in California, but this can be generalized for other regions and fuels.
Stochastic Gradients for Large-Scale Tensor Decomposition
Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of large-scale tensors. GCP tensor decomposition is a recently proposed version of tensor decomposition that allows for a variety of loss functions such as logistic loss for binary data or Huber loss for robust estimation. The stochastic gradient is formed from randomly sampled elements of the tensor. For dense tensors, we simply use uniform sampling. For sparse tensors, we propose two types of stratified sampling that give precedence to sampling nonzeros. Numerical results demonstrate the advantages of the proposed approach and its scalability to large-scale problems.
Encoding Invariances in Deep Generative Models
Shah, Viraj, Joshi, Ameya, Ghosal, Sambuddha, Pokuri, Balaji, Sarkar, Soumik, Ganapathysubramanian, Baskar, Hegde, Chinmay
Reliable training of generative adversarial networks (GANs) typically require massive datasets in order to model complicated distributions. However, in several applications, training samples obey invariances that are \textit{a priori} known; for example, in complex physics simulations, the training data obey universal laws encoded as well-defined mathematical equations. In this paper, we propose a new generative modeling approach, InvNet, that can efficiently model data spaces with known invariances. We devise an adversarial training algorithm to encode them into data distribution. We validate our framework in three experimental settings: generating images with fixed motifs; solving nonlinear partial differential equations (PDEs); and reconstructing two-phase microstructures with desired statistical properties. We complement our experiments with several theoretical results.
All SMILES Variational Autoencoder
Alperstein, Zaccary, Cherkasov, Artem, Rolfe, Jason Tyler
Variational autoencoders (VAEs) defined over SMILES string and graph-based representations of molecules promise to improve the optimization of molecular properties, thereby revolutionizing the pharmaceuticals and materials industries. However, these VAEs are hindered by the non-unique nature of SMILES strings and the computational cost of graph convolutions. To efficiently pass messages along all paths through the molecular graph, we encode multiple SMILES strings of a single molecule using a set of stacked recurrent neural networks, pooling hidden representations of each atom between SMILES representations, and use attentional pooling to build a final fixed-length latent representation. By then decoding to a disjoint set of SMILES strings of the molecule, our All SMILES VAE learns an almost bijective mapping between molecules and latent representations near the high-probability-mass subspace of the prior. Our SMILES-derived but molecule-based latent representations significantly surpass the state-of-the-art in a variety of fully- and semi-supervised property regression and molecular property optimization tasks.
Causal Discovery with Cascade Nonlinear Additive Noise Models
Cai, Ruichu, Qiao, Jie, Zhang, Kun, Zhang, Zhenjie, Hao, Zhifeng
Identification of causal direction between a causal-effect pair from observed data has recently attracted much attention. Various methods based on functional causal models have been proposed to solve this problem, by assuming the causal process satisfies some (structural) constraints and showing that the reverse direction violates such constraints. The nonlinear additive noise model has been demonstrated to be effective for this purpose, but the model class is not transitive--even if each direct causal relation follows this model, indirect causal influences, which result from omitted intermediate causal variables and are frequently encountered in practice, do not necessarily follow the model constraints; as a consequence, the nonlinear additive noise model may fail to correctly discover causal direction. In this work, we propose a cascade nonlinear additive noise model to represent such causal influences--each direct causal relation follows the nonlinear additive noise model but we observe only the initial cause and final effect. We further propose a method to estimate the model, including the unmeasured intermediate variables, from data, under the variational auto-encoder framework. Our theoretical results show that with our model, causal direction is identifiable under suitable technical conditions on the data generation process. Simulation results illustrate the power of the proposed method in identifying indirect causal relations across various settings, and experimental results on real data suggest that the proposed model and method greatly extend the applicability of causal discovery based on functional causal models in nonlinear cases.
Temporal Density Extrapolation using a Dynamic Basis Approach
Krempl, Georg, Lang, Dominik, Hofer, Vera
Density estimation is a versatile technique underlying many data mining tasks and techniques,ranging from exploration and presentation of static data, to probabilistic classification, or identifying changes or irregularities in streaming data. With the pervasiveness of embedded systems and digitisation, this latter type of streaming and evolving data becomes more important. Nevertheless, research in density estimation has so far focused on stationary data, leaving the task of of extrapolating and predicting density at time points outside a training window an open problem. For this task, Temporal Density Extrapolation (TDX) is proposed. This novel method models and predicts gradual monotonous changes in a distribution. It is based on the expansion of basis functions, whose weights are modelled as functions of compositional data over time by using an isometric log-ratio transformation. Extrapolated density estimates are then obtained by extrapolating the weights to the requested time point, and querying the density from the basis functions with back-transformed weights. Our approach aims for broad applicability by neither being restricted to a specific parametric distribution, nor relying on cluster structure in the data.It requires only two additional extrapolation-specific parameters, for which reasonable defaults exist. Experimental evaluation on various data streams, synthetic as well as from the real-world domains of credit scoring and environmental health, shows that the model manages to capture monotonous drift patterns accurately and better than existing methods. Thereby, it requires not more than 1.5-times the run time of a corresponding static density estimation approach.