Goto

Collaborating Authors

 Energy


Principled Training of Neural Networks with Direct Feedback Alignment

arXiv.org Machine Learning

The backpropagation algorithm has long been the canonical training method for neural networks. Modern paradigms are implicitly optimized for it, and numerous guidelines exist to ensure its proper use. Recently, synthetic gradients methods - where the error gradient is only roughly approximated - have garnered interest. These methods not only better portray how biological brains are learning, but also open new computational possibilities, such as updating layers asynchronously. Even so, they have failed to scale past simple tasks like MNIST or CIFAR-10. This is in part due to a lack of standards, leading to ill-suited models and practices forbidding such methods from performing to the best of their abilities. In this work, we focus on direct feedback alignment and present a set of best practices justified by observations of the alignment angles. We characterize a bottleneck effect that prevents alignment in narrow layers, and hypothesize it may explain why feedback alignment methods have yet to scale to large convolutional networks.


Medium-Term Load Forecasting Using Support Vector Regression, Feature Selection, and Symbiotic Organism Search Optimization

arXiv.org Machine Learning

An accurate load forecasting has always been one of the main indispensable parts in the operation and planning of power systems. Among different time horizons of forecasting, while short-term load forecasting (STLF) and long-term load forecasting (LTLF) have respectively got benefits of accurate predictors and probabilistic forecasting, medium-term load forecasting (MTLF) demands more attention due to its vital role in power system operation and planning such as optimal scheduling of generation units, robust planning program for customer service, and economic supply. In this study, a hybrid method, composed of Support Vector Regression (SVR) and Symbiotic Organism Search Optimization (SOSO) method, is proposed for MTLF. In the proposed forecasting model, SVR is the main part of the forecasting algorithm while SOSO is embedded into it to optimize the parameters of SVR. In addition, a minimum redundancy-maximum relevance feature selection algorithm is used to in the preprocessing of input data. The proposed method is tested on EUNITE competition dataset to demonstrate its proper performance. Furthermore, it is compared with some previous works to show eligibility of our method.


PABO: Pseudo Agent-Based Multi-Objective Bayesian Hyperparameter Optimization for Efficient Neural Accelerator Design

arXiv.org Machine Learning

The ever increasing computational cost of Deep Neural Networks (DNN) and the demand for energy efficient hardware for DNN acceleration has made accuracy and hardware cost co-optimization for DNNs tremendously important, especially for edge devices. Owing to the large parameter space and cost of evaluating each parameter in the search space, manually tuning of DNN hyperparameters is impractical. Automatic joint DNN and hardware hyperparameter optimization is indispensable for such problems. Bayesian optimization-based approaches have shown promising results for hyperparameter optimization of DNNs. However, most of these techniques have been developed without considering the underlying hardware, thereby leading to inefficient designs. Further, the few works that perform joint optimization are not generalizable and mainly focus on CMOS-based architectures. In this work, we present a novel pseudo agent-based multi-objective hyperparameter optimization (PABO) for maximizing the DNN performance while obtaining low hardware cost. Compared to the existing methods, our work poses a theoretically different approach for joint optimization of accuracy and hardware cost and focuses on memristive crossbar-based accelerators. PABO uses a supervisor agent to establish connections between the posterior Gaussian distribution models of network accuracy and hardware cost requirements. The agent reduces the mathematical complexity of the co-optimization problem by removing unnecessary computations and updates of acquisition functions, thereby achieving significant speed-ups for the optimization procedure. PABO outputs a Pareto frontier that underscores the trade-offs between designing high-accuracy and hardware efficiency. Our results demonstrate a superior performance compared to the state-of-the-art methods both in terms of accuracy and computational speed (~100x speed up).


Calibration, Entropy Rates, and Memory in Language Models

arXiv.org Machine Learning

Recent advances in language modeling have resulted in significant breakthroughs on a wide variety of benchmarks in natural language processing Dai et al. [2018], Gong et al. [2018], Takase et al. [2018]. Capturing long-term dependencies has especially been a major focus, with approaches ranging from explicit memory-based neural networks Grave et al. [2016], Ke et al. [2018] to optimization improvements aimed at stabilizing training Le et al. [2015], Trinh et al. [2018]. In this paper, we address a basic question: how do the long-term dependencies in a language model's generations compare to those of the underlying language? Furthermore, if there are measurable discrepancies, this leads to the question of whether and how we can use them to improve these models. Starting from Shannon's seminal work that essentially introduced statistical language modeling Shannon [1951], the most classical and widely studied long-term property of a language model is its entropy rate -- the average amount of information contained per word, conditioned on the preceding words. A learned model provides an upper bound for the entropy rate of a language, via its cross-entropy loss. The exponential of the entropy rate can be interpreted as the effective support size of the distribution of the next word (intuitively, the average number of "plausible" word choices to continue a document), and the perplexity score of a model (the exponential of the cross entropy loss) is an upper bound for this quantity.


Macro-action Multi-timescale Dynamic Programming for Energy Management with Phase Change Materials

arXiv.org Machine Learning

This paper focuses on home energy management systems (HEMS) in buildings that have controllable HVAC systems and use phase change material (PCM) as an energy storage system. In this setting, optimally operating a HVAC system is a challenge, because of the nonlinear and non-convex characteristics of the PCM, which makes the corresponding optimization problem impractical with commonly used methods in HEMS. Instead, we use dynamic programming (DP) to deal with the nonlinear features of PCM. However, DP suffers from the curse of dimensionality. Given this drawback, this paper proposes a novel methodology to reduce the computational burden of the DP algorithm in HEMS optimisation with PCM, while maintaining the quality of the solution. Specifically, the method incorporates approaches from sequential decision making in artificial intelligence, including macro-action and multi-time scale abstractions, coupled with an underlying state-space approximation to reduce state-space and action-space size. The method is demonstrated on an energy management problem for a typical residential building located in Sydney for four seasonal weather conditions. Our results demonstrate that the proposed method performs well with an attractive computational cost. In particular, it has a significant speed-up over directly applying DP to the problem, of up to 12900 times faster.


Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks

arXiv.org Artificial Intelligence

Tight estimation of the Lipschitz constant for deep neural networks (DNNs) is useful in many applications ranging from robustness certification of classifiers to stability analysis of closed-loop systems with reinforcement learning controllers. Existing methods in the literature for estimating the Lipschitz constant suffer from either lack of accuracy or poor scalability. In this paper, we present a convex optimization framework to compute guaranteed upper bounds on the Lipschitz constant of DNNs both accurately and efficiently. Our main idea is to interpret activation functions as gradients of convex potential functions. Hence, they satisfy certain properties that can be described by quadratic constraints. This particular description allows us to pose the Lipschitz constant estimation problem as a semidefinite program (SDP). The resulting SDP can be adapted to increase either the estimation accuracy (by capturing the interaction between activation functions of different layers) or scalability (by decomposition and parallel implementation). We illustrate the utility of our approach with a variety of experiments on randomly generated networks and on classifiers trained on the MNIST and Iris datasets. In particular, we experimentally demonstrate that our Lipschitz bounds are the most accurate compared to those in the literature. We also study the impact of adversarial training methods on the Lipschitz bounds of the resulting classifiers and show that our bounds can be used to efficiently provide robustness guarantees.


Tackling Climate Change with Machine Learning

arXiv.org Artificial Intelligence

Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change.


Bayesian Automatic Relevance Determination for Utility Function Specification in Discrete Choice Models

arXiv.org Machine Learning

Specifying utility functions is a key step towards applying the discrete choice framework for understanding the behaviour processes that govern user choices. However, identifying the utility function specifications that best model and explain the observed choices can be a very challenging and time-consuming task. This paper seeks to help modellers by leveraging the Bayesian framework and the concept of automatic relevance determination (ARD), in order to automatically determine an optimal utility function specification from an exponentially large set of possible specifications in a purely data-driven manner. Based on recent advances in approximate Bayesian inference, a doubly stochastic variational inference is developed, which allows the proposed DCM-ARD model to scale to very large and high-dimensional datasets. Using semi-artificial choice data, the proposed approach is shown to very accurately recover the true utility function specifications that govern the observed choices. Moreover, when applied to real choice data, DCM-ARD is shown to be able discover high quality specifications that can outperform previous ones from the literature according to multiple criteria, thereby demonstrating its practical applicability.


Meta-Learning Neural Bloom Filters

arXiv.org Machine Learning

There has been a recent trend in training neural networks to replace data structures that have been crafted by hand, with an aim for faster execution, better accuracy, or greater compression. In this setting, a neural data structure is instantiated by training a network over many epochs of its inputs until convergence. In applications where inputs arrive at high throughput, or are ephemeral, training a network from scratch is not practical. This motivates the need for few-shot neural data structures. In this paper we explore the learning of approximate set membership over a set of data in one-shot via meta-learning. We propose a novel memory architecture, the Neural Bloom Filter, which is able to achieve significant compression gains over classical Bloom Filters and existing memory-augmented neural networks.


Landslide Geohazard Assessment With Convolutional Neural Networks Using Sentinel-2 Imagery Data

arXiv.org Machine Learning

In this paper, the authors aim to combine the latest state of the art models in image recognition with the best publicly available satellite images to create a system for landslide risk mitigation. We focus first on landslide detection and further propose a similar system to be used for prediction. Such models are valuable as they could easily be scaled up to provide data for hazard evaluation, as satellite imagery becomes increasingly available. The goal is to use satellite images and correlated data to enrich the public repository of data and guide disaster relief efforts for locating precise areas where landslides have occurred. Different image augmentation methods are used to increase diversity in the chosen dataset and create more robust classification. The resulting outputs are then fed into variants of 3-D convolutional neural networks. A review of the current literature indicates there is no research using CNNs (Convolutional Neural Networks) and freely available satellite imagery for classifying landslide risk. The model has shown to be ultimately able to achieve a significantly better than baseline accuracy.