Goto

Collaborating Authors

 Energy


Towards Lower Bit Multiplication for Convolutional Neural Network Training

arXiv.org Machine Learning

Convolutional Neural Networks (CNNs) have been widely used in many fields. However, the training process costs much energy and time, in which the convolution operations consume the major part. In this paper, we propose a fixed-point training framework, in order to reduce the data bit-width for the convolution multiplications. Firstly, we propose two constrained group-wise scaling methods that can be implemented with low hardware cost. Secondly, to overcome the challenge of trading off overflow and rounding error, a shiftable fixed-point data format is used in this framework. Finally, we propose a double-width deployment technique to boost inference performance with the same bit-width hardware multiplier. The experimental results show that the input data of convolution in the training process can be quantized to 2-bit for CIFAR-10 dataset, 6-bit for ImageNet dataset, with negligible accuracy degradation. Furthermore, our fixed-point train-ing framework has the potential to save at least 75% energy of the computation in the training process.


Autonomous Materials Discovery Driven by Gaussian Process Regression with Inhomogeneous Measurement Noise and Anisotropic Kernels

arXiv.org Machine Learning

A majority of experimental disciplines face the challenge of exploring large and high-dimensional parameter spaces in search of new scientific discoveries. Materials science is no exception; the wide variety of synthesis, processing, and environmental conditions that influence material properties gives rise to particularly vast parameter spaces. Recent advances have led to an increase in efficiency of materials discovery by increasingly automating the exploration processes. Methods for autonomous experimentation have become more sophisticated recently, allowing for multi-dimensional parameter spaces to be explored efficiently and with minimal human intervention, thereby liberating the scientists to focus on interpretations and big-picture decisions. Gaussian process regression (GPR) techniques have emerged as the method of choice for steering many classes of experiments. We have recently demonstrated the positive impact of GPR-driven decision-making algorithms on autonomously steering experiments at a synchrotron beamline. However, due to the complexity of the experiments, GPR often cannot be used in its most basic form, but rather has to be tuned to account for the special requirements of the experiments. Two requirements seem to be of particular importance, namely inhomogeneous measurement noise (input dependent or non-i.i.d.) and anisotropic kernel functions, which are the two concepts that we tackle in this paper. Our synthetic and experimental tests demonstrate the importance of both concepts for experiments in materials science and the benefits that result from including them in the autonomous decision-making process.


Predicting molecular dipole moments by combining atomic partial charges and atomic dipoles

arXiv.org Machine Learning

The molecular dipole moment ($\boldsymbol{\mu}$) is a central quantity in chemistry. It is essential in predicting infrared and sum-frequency generation spectra, as well as induction and long-range electrostatic interactions. Furthermore, it can be extracted directly from high-level quantum mechanical calculations, making it an ideal target for machine learning (ML). In this work, we choose to represent this quantity with a physically inspired ML model that captures two distinct physical effects: local atomic polarization is captured within the symmetry-adapted Gaussian process regression (SA-GPR) framework, which assigns a (vector) dipole moment to each atom, while movement of charge across the entire molecule is captured by assigning a partial (scalar) charge to each atom. The resulting "MuML" models are fitted together to reproduce molecular $\boldsymbol{\mu}$ computed using high-level coupled-cluster theory (CCSD) and density functional theory (DFT) on the QM7b dataset. The combined model shows excellent transferability when applied to a showcase dataset of larger and more complex molecules, approaching the accuracy of DFT at a small fraction of the computational cost. We also demonstrate that the uncertainty in the predictions can be estimated reliably using a calibrated committee model. The ultimate performance of the models depends, however, on the details of the system at hand, with the scalar model being clearly superior when describing large molecules whose dipole is almost entirely generated by charge separation. These observations point to the importance of simultaneously accounting for the local and non-local effects that contribute to $\boldsymbol{\mu}$; further, they define a challenging task to benchmark future models, particularly those aimed at the description of condensed phases.


Hierarchical forecast reconciliation with machine learning

arXiv.org Machine Learning

Hierarchical forecasting methods have been widely used to support aligned decision-making by providing coherent forecasts at different aggregation levels. Traditional hierarchical forecasting approaches, such as the bottom-up and top-down methods, focus on a particular aggregation level to anchor the forecasts. During the past decades, these have been replaced by a variety of linear combination approaches that exploit information from the complete hierarchy to produce more accurate forecasts. However, the performance of these combination methods depends on the particularities of the examined series and their relationships. This paper proposes a novel hierarchical forecasting approach based on machine learning that deals with these limitations in three important ways. First, the proposed method allows for a non-linear combination of the base forecasts, thus being more general than the linear approaches. Second, it structurally combines the objectives of improved post-sample empirical forecasting accuracy and coherence. Finally, due to its non-linear nature, our approach selectively combines the base forecasts in a direct and automated way without requiring that the complete information must be used for producing reconciled forecasts for each series and level. The proposed method is evaluated both in terms of accuracy and bias using two different data sets coming from the tourism and retail industries. Our results suggest that the proposed method gives superior point forecasts than existing approaches, especially when the series comprising the hierarchy are not characterized by the same patterns.


Recurrent Convolutional Neural Networks help to predict location of Earthquakes

arXiv.org Machine Learning

We examine the applicability of modern neural network architectures to the midterm prediction of earthquakes. Our data-based classification model aims to predict if an earthquake with the magnitude above a threshold takes place at a given area of size $10 \times 10$ kilometers in $10$-$60$ days from a given moment. Our deep neural network model has a recurrent part (LSTM) that accounts for time dependencies between earthquakes and a convolutional part that accounts for spatial dependencies. Obtained results show that neural networks-based models beat baseline feature-based models that also account for spatio-temporal dependencies between different earthquakes. For historical data on Japan earthquakes our model predicts occurrence of an earthquake in $10$ to $60$ days from a given moment with magnitude $M_c > 5$ with quality metrics ROC AUC $0.975$ and PR AUC $0.0890$, making $1.18 \cdot 10^3$ correct predictions, while missing $2.09 \cdot 10^3$ earthquakes and making $192 \cdot 10^3$ false alarms. The baseline approach has similar ROC AUC $0.992$, number of correct predictions $1.19 \cdot 10^3$, and missing $2.07 \cdot 10^3$ earthquakes, but significantly worse PR AUC $0.00911$, and number of false alarms $1004 \cdot 10^3$.


A summary of the keynotes at AAMAS

AIHub

A virtual edition of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) conference was held on 9-13 May. Videos of the talks are now available for public viewing, and you can also see the sessions from the various workshops. Alison is interested in how cities work and builds spatial agent-based models (ABMs) to study how people move around and how behaviour plays out in space and time. There are a number of challenges with these kinds of models and they need to be really robust if they are to be adopted by policy makers. So, why should we be interested in modelling cities?


Thanks To Renewables And Machine Learning, Google Now Forecasts The Wind

#artificialintelligence

Wind farms have traditionally made less money for the electricity they produce because they have been unable to predict how windy it will be tomorrow. "The way a lot of power markets work is you have to schedule your assets a day ahead," said Michael Terrell, the head of energy market strategy at Google. "And you tend to get compensated higher when you do that than if you sell into the market real-time. "Well, how do variable assets like wind schedule a day ahead when you don't know the wind is going to blow?" Terrell asked, "and how can you actually reserve your place in line?" Here's how: Google and the Google-owned Artificial Intelligence firm DeepMind combined weather data with power data from 700 megawatts of wind energy that Google sources in the Central United States. Using machine learning, they have been able to better predict wind production, better predict electricity supply and demand, and as a result, reduce operating costs. "What we've been doing is working in partnership with the DeepMind team to use machine learning to take the weather data that's available publicly, actually forecast what we think the wind production will be the next day, and bid that wind into the day-ahead markets," Terrell said in a recent seminar hosted by the Stanford Precourt Institute of Energy. Stanford University posted video of the seminar last week. The result has been a 20 percent increase in revenue for wind farms, Terrell said. The Department of Energy listed improved wind forecasting as a first priority in its 2015 Wind Vision report, largely to improve reliability: "Improve Wind Resource Characterization," the report said at the top of its list of goals. "Collect data and develop models to improve wind forecasting at multiple temporal scales--e.g., minutes, hours, days, months, years." Google's goal has been more sweeping: to scrub carbon entirely from its energy portfolio, which consumes as much power as two San Franciscos. Google achieved an initial milestone by matching its annual energy use with its annual renewable-energy procurement, Terrell said. But the company has not been carbon-free in every location at every hour, which is now its new goal--what Terrell calls its "24x7 carbon-free" goal. "We're really starting to turn our efforts in this direction, and we're finding that it's not something that's easy to do.


Objective-Sensitive Principal Component Analysis for High-Dimensional Inverse Problems

arXiv.org Machine Learning

We present a novel approach for adaptive, differentiable parameterization of large-scale random fields. If the approach is coupled with any gradient-based optimization algorithm, it can be applied to a variety of optimization problems, including history matching. The developed technique is based on principal component analysis (PCA) but modifies a purely data-driven basis of principal components considering objective function behavior. To define an efficient encoding, Gradient-Sensitive PCA uses an objective function gradient with respect to model parameters. We propose computationally efficient implementations of the technique, and two of them are based on stationary perturbation theory (SPT). Optimality, correctness, and low computational costs of the new encoding approach are tested, verified, and discussed. Three algorithms for optimal parameter decomposition are presented and applied to an objective of 2D synthetic history matching. The results demonstrate improvements in encoding quality regarding objective function minimization and distributional patterns of the desired field. Possible applications and extensions are proposed.


Uncertainty Principle based optimization; new metaheuristics framework

arXiv.org Artificial Intelligence

To more flexibly balance between exploration and exploitation, a new meta-heuristic method based on Uncertainty Principle concepts is proposed in this paper. UP is is proved effective in multiple branches of science. In the branch of quantum mechanics, canonically conjugate observables such as position and momentum cannot both be distinctly determined in any quantum state. In the same manner, the branch of Spectral filtering design implies that a nonzero function and its Fourier transform cannot both be sharply localized. After delving into such concepts on Uncertainty Principle and their variations in quantum physics, Fourier analysis, and wavelet design, the proposed framework is described in terms of algorithm and flowchart. Our proposed optimizer's idea is based on an inherent uncertainty in performing local search versus global solution search. A set of compatible metrics for each part of the framework is proposed to derive preferred form of algorithm. Evaluations and comparisons at the end of paper show competency and distinct capability of the algorithm over some of the well-known and recently proposed metaheuristics.


A probabilistic generative model for semi-supervised training of coarse-grained surrogates and enforcing physical constraints through virtual observables

arXiv.org Machine Learning

The data-centric construction of inexpensive surrogates for fine-grained, physical models has been at the forefront of computational physics due to its significant utility in many-query tasks such as uncertainty quantification. Recent efforts have taken advantage of the enabling technologies from the field of machine learning (e.g. deep neural networks) in combination with simulation data. While such strategies have shown promise even in higher-dimensional problems, they generally require large amounts of training data even though the construction of surrogates is by definition a Small Data problem. Rather than employing data-based loss functions, it has been proposed to make use of the governing equations (in the simplest case at collocation points) in order to imbue domain knowledge in the training of the otherwise black-box-like interpolators. The present paper provides a flexible, probabilistic framework that accounts for physical structure and information both in the training objectives as well as in the surrogate model itself. We advocate a probabilistic (Bayesian) model in which equalities that are available from the physics (e.g. residuals, conservation laws) can be introduced as virtual observables and can provide additional information through the likelihood. We further advocate a generative model i.e. one that attempts to learn the joint density of inputs and outputs that is capable of making use of unlabeled data (i.e. only inputs) in a semi-supervised fashion in order to promote the discovery of lower-dimensional embeddings which are nevertheless predictive of the fine-grained model's output.