Energy
GAIT-prop: A biologically plausible learning rule derived from backpropagation of error
Ahmad, Nasir, van Gerven, Marcel A. J., Ambrogioni, Luca
Traditional backpropagation of error, though a highly successful algorithm for learning in artificial neural network models, includes features which are biologically implausible for learning in real neural circuits. An alternative called target propagation proposes to solve this implausibility by using a top-down model of neural activity to convert an error at the output of a neural network into layer-wise and plausible 'targets' for every unit. These targets can then be used to produce weight updates for network training. However, thus far, target propagation has been heuristically proposed without demonstrable equivalence to backpropagation. Here, we derive an exact correspondence between backpropagation and a modified form of target propagation (GAIT-prop) where the target is a small perturbation of the forward pass. Specifically, backpropagation and GAIT-prop give identical updates when synaptic weight matrices are orthogonal. In a series of simple computer vision experiments, we show near-identical performance between backpropagation and GAIT-prop with a soft orthogonality-inducing regularizer.
CoPhy-PGNN: Learning Physics-guided Neural Networks withCompeting Loss Functions for Solving Eigenvalue Problems
Elhamod, Mohannad, Bu, Jie, Singh, Christopher, Redell, Matthew, Ghosh, Abantika, Podolskiy, Viktor, Lee, Wei-Cheng, Karpatne, Anuj
Physics-guided Neural Networks (PGNNs) represent an emerging class of neural networks that are trained using physics-guided (PG) loss functions (capturing violations in network outputs with known physics), along with the supervision contained in data. Existing work in PGNNs have demonstrated the efficacy of adding single PG loss functions in the neural network objectives, using constant trade-off parameters, to ensure better generalizability. However, in the presence of multiple physics loss functions with competing gradient directions, there is a need to adaptively tune the contribution of competing PG loss functions during the course of training to arrive at generalizable solutions. We demonstrate the presence of competing PG losses in the generic neural network problem of solving for the lowest (or highest) eigenvector of a physics-based eigenvalue equation, common to many scientific problems. We present a novel approach to handle competing PG losses and demonstrate its efficacy in learning generalizable solutions in two motivating applications of quantum mechanics and electromagnetic propagation. All the code and data used in this work is available at https://github.com/jayroxis/Cophy-PGNN.
Uncertainty Quantification of Darcy Flow through Porous Media using Deep Gaussian Process
Daneshkhah, A., Chatrabgoun, O., Esmaeilbeigi, M., Sedighi, T., Abolfathi, S.
A computational method based on the non-linear Gaussian process (GP), known as deep Gaussian processes (deep GPs) for uncertainty quantification & propagation in modelling of flow through heterogeneous porous media is presented. The method is also used for reducing dimensionality of model output and consequently emulating highly complex relationship between hydrogeological properties and reduced order fluid velocity field in a tractable manner. Deep GPs are multi-layer hierarchical generalisations of GPs with multiple, infinitely wide hidden layers that are very efficient models for deep learning and modelling of high-dimensional complex systems by tackling the complexity through several hidden layers connected with non-linear mappings. According to this approach, the hydrogeological data is modelled as the output of a multivariate GP whose inputs are governed by another GP such that each single layer is either a standard GP or the Gaussian process latent variable model. A variational approximation framework is used so that the posterior distribution of the model outputs associated to given inputs can be analytically approximated. In contrast to the other dimensionality reduction, methods that do not provide any information about the dimensionality of each hidden layer, the proposed method automatically selects the dimensionality of each hidden layer and it can be used to propagate uncertainty obtained in each layer across the hierarchy. Using this, dimensionality of the full input space consists of both geometrical parameters of modelling domain and stochastic hydrogeological parameters can be simultaneously reduced without the need for any simplifications generally being assumed for stochastic modelling of subsurface flow problems. It allows estimation of the flow statistics with greatly reduced computational efforts compared to other stochastic approaches such as Monte Carlo method.
Stochastic Approximation for High-frequency Observations in Data Assimilation
With the increasing penetration of high-frequency sensors across a number of biological and physical systems, the abundance of the resulting observations offers opportunities for higher statistical accuracy of down-stream estimates, but their frequency results in a plethora of computational problems in data assimilation tasks. The high-frequency of these observations has been traditionally dealt with by using data modification strategies such as accumulation, averaging, and sampling. However, these data modification strategies will reduce the quality of the estimates, which may be untenable for many systems. Therefore, to ensure high-quality estimates, we adapt stochastic approximation methods to address the unique challenges of high-frequency observations in data assimilation. As a result, we are able to produce estimates that leverage all of the observations in a manner that avoids the aforementioned computational problems and preserves the statistical accuracy of the estimates.
Fast Rates for Contextual Linear Optimization
Hu, Yichun, Kallus, Nathan, Mao, Xiaojie
Incorporating side observations of predictive features can help reduce uncertainty in operational decision making, but it also requires we tackle a potentially complex predictive relationship. Although one may use a variety of off-the-shelf machine learning methods to learn a predictive model and then plug it into our decision-making problem, a variety of recent work has instead advocated integrating estimation and optimization by taking into consideration downstream decision performance. Surprisingly, in the case of contextual linear optimization, we show that the naive plug-in approach actually achieves regret convergence rates that are significantly faster than the best-possible by methods that directly optimize down-stream decision performance. We show this by leveraging the fact that specific problem instances do not have arbitrarily bad near-degeneracy. While there are other pros and cons to consider as we discuss, our results highlight a very nuanced landscape for the enterprise to integrate estimation and optimization.
Reinforcement Learning with Augmented Data
Laskin, Michael, Lee, Kimin, Stooke, Adam, Pinto, Lerrel, Abbeel, Pieter, Srinivas, Aravind
Learning from visual observations is a fundamental yet challenging problem in Reinforcement Learning (RL). Although algorithmic advances combined with convolutional neural networks have proved to be a recipe for success, current methods are still lacking on two fronts: (a) data-efficiency of learning and (b) generalization to new environments. To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms. We perform the first extensive study of general data augmentations for RL on both pixel-based and state-based inputs, and introduce two new data augmentations - random translate and random amplitude scale. We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks. RAD sets a new state-of-the-art in terms of data-efficiency and final performance on the DeepMind Control Suite benchmark for pixel-based control as well as OpenAI Gym benchmark for state-based control. We further demonstrate that RAD significantly improves test-time generalization over existing methods on several OpenAI ProcGen benchmarks.
Learning Interpretable Feature Context Effects in Discrete Choice
Tomlinson, Kiran, Benson, Austin R.
The outcomes of elections, product sales, and the structure of social connections are all determined by the choices individuals make when presented with a set of options, so understanding the factors that contribute to choice is crucial. Of particular interest are context effects, which occur when the set of available options influences a chooser's relative preferences, as they violate traditional rationality assumptions yet are widespread in practice. However, identifying these effects from observed choices is challenging, often requiring foreknowledge of the effect to be measured. In contrast, we provide a method for the automatic discovery of a broad class of context effects from observed choice data. Our models are easier to train and more flexible than existing models and also yield intuitive, interpretable, and statistically testable context effects. Using our models, we identify new context effects in widely used choice datasets and provide the first analysis of choice set context effects in social network growth.
Anomaly detection in average fuel consumption with XAI techniques for dynamic generation of explanations
In this paper we show a complete process for unsupervised anomaly detection for the average fuel consumption of fleet vehicles that is able to explain what variables are affecting the consumption in terms of feature relevance. For doing that, we combine the anomaly detection with a surrogate model that is able to provide that feature relevance. For this part, we evaluate both whitebox models from the literature, as well as novel variations over them, and blackbox models combined with local posthoc feature relevance techniques. The evaluation is done using real IoT data belonging to Telef\'onica, and is measured both in terms of model performance, as well as using Explainable AI metrics that compare the explanations generated in terms representativeness, fidelity, stability and contrastiveness. The explanations generate counterfactual recommendations that show what could have been done to reduce the average fuel consumption of a vehicle and turn it into an inlier. The procedure is combined with domain knowledge expressed in business rules, and is able to adequate the type of explanations depending on the target user profile.
Will Edge AI be the ML architecture of the future?
Edge AI offers lots of improvement over conventional ML architectures. First of all the latency involved with any network transfer is removed, which can be critical in some use cases. The battery drain involved with streaming data is no longer an issue, allowing for better battery life, and associated costs for data communication are significantly reduced. This is highly beneficial for a number of use cases. Sensors in remote locations like offshore wind farms can come pre-loaded with the algorithms that enable them to make decisions without the complex infrastructure of getting them internet-connected.
How AI can help boost alternative and renewable energy use
Ten years ago, I was engaged in the writing of an energy power grid report that was part of a national initiative to assess the health of our electrical energy grid and its resilience. Assets like wind farms and contemporary fossil and nuclear fuel systems were in place for energy distribution, but to my surprise there was also equipment in the grid that dated back to the 1890s and was still in production. I began to understand the challenges of using renewable energy such as wind and solar when it came to assessing energy supply and demand and ensuring there is enough on-hand energy to power the homes and businesses that are relying on it. When utilities were using gas, coal, or nuclear energy to power the grid, the in-flow of that fuel from its source was consistent, so it was easy to assess supply and demand on any given day and to deliver the energy needed to power homes and businesses. What if the wind gusted to 40 mph one day, and was perfectly still on the next day?