Goto

Collaborating Authors

 Uncertainty


Random Feature Expansions for Deep Gaussian Processes

arXiv.org Machine Learning

The composition of multiple Gaussian Processes as a Deep Gaussian Process (DGP) enables a deep probabilistic nonparametric approach to flexibly tackle complex machine learning problems with sound quantification of uncertainty. Existing inference approaches for DGP models have limited scalability and are notoriously cumbersome to construct. In this work, we introduce a novel formulation of DGPs based on random feature expansions that we train using stochastic variational inference. This yields a practical learning framework which significantly advances the state-of-the-art in inference for DGPs, and enables accurate quantification of uncertainty. We extensively showcase the scalability and performance of our proposal on several datasets with up to 8 million observations, and various DGP architectures with up to 30 hidden layers.


Balancing New Against Old Information: The Role of Surprise in Learning

arXiv.org Machine Learning

To guide their behavior, humans and animals rely on previously learned knowledge about the world. Since the world is complex and models of the world are never perfect, the question arises whether we should trust our internal world model that we have built from past data or whether we should readjust it when we receive a new data sample. In noisy environments, a single data sample may not be reliable and in general we need to average over several data samples. However, when a structural change occurs in the environment, the most recent data samples are the most informative ones and we should put more weight on recent data samples than on earlier ones. Indeed, both humans and animals adaptively adjust the relative contribution of old and newly acquired data during learning (Behrens et al., 2007; Nassar et al., 2012; Krugel et al., 2009; Pearce and Hall, 1980) and rapidly adapt to changing environments (Pearce and Hall, 1980; Wilson et al., 1992; Holland, 1997).


Big Learning with Bayesian Methods

arXiv.org Machine Learning

Explosive growth in data and availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms, systems, and applications with Big Data. Bayesian methods represent one important class of statistic methods for machine learning, with substantial recent developments on adaptive, flexible and scalable Bayesian learning. This article provides a survey of the recent advances in Big learning with Bayesian methods, termed Big Bayesian Learning, including nonparametric Bayesian methods for adaptively inferring model complexity, regularized Bayesian inference for improving the flexibility via posterior regularization, and scalable algorithms and systems based on stochastic subsampling and distributed computing for dealing with large-scale applications.


A Hierarchical Genetic Optimization of a Fuzzy Logic System for Flow Control in Micro Grids

arXiv.org Artificial Intelligence

Bio-inspired algorithms like Genetic Algorithms and Fuzzy Inference Systems (FIS) are nowadays widely adopted as hybrid techniques in commercial and industrial environment. In this paper we present an interesting application of the fuzzy-GA paradigm to Smart Grids. The main aim consists in performing decision making for power flow management tasks in the proposed microgrid model equipped by renewable sources and an energy storage system, taking into account the economical profit in energy trading with the main-grid. In particular, this study focuses on the application of a Hierarchical Genetic Algorithm (HGA) for tuning the Rule Base (RB) of a Fuzzy Inference System (FIS), trying to discover a minimal fuzzy rules set in a Fuzzy Logic Controller (FLC) adopted to perform decision making in the microgrid. The HGA rationale focuses on a particular encoding scheme, based on control genes and parametric genes applied to the optimization of the FIS parameters, allowing to perform a reduction in the structural complexity of the RB. This approach will be referred in the following as fuzzy-HGA. Results are compared with a simpler approach based on a classic fuzzy-GA scheme, where both FIS parameters and rule weights are tuned, while the number of fuzzy rules is fixed in advance. Experiments shows how the fuzzy-HGA approach adopted for the synthesis of the proposed controller outperforms the classic fuzzy-GA scheme, increasing the accounting profit by 67\% in the considered energy trading problem yielding at the same time a simpler RB.


Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations

arXiv.org Machine Learning

We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization, and an efficient alternating gradient descent algorithm with hard thresholding to solve it. Our algorithm is orders of magnitude faster than the convex relaxation based methods for LVGGM. In addition, we prove that our algorithm is guaranteed to linearly converge to the unknown sparse and low-rank components up to the optimal statistical precision. Experiments on both synthetic and genomic data demonstrate the superiority of our algorithm over the state-of-the-art algorithms and corroborate our theory.


Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

arXiv.org Machine Learning

We present an approach to deep estimation of discrete conditional probability distributions. Such models have several applications, including generative modeling of audio, image, and video data. Our approach combines two main techniques: dyadic partitioning and graph-based smoothing of the discrete space. By recursively decomposing each dimension into a series of binary splits and smoothing over the resulting distribution using graph-based trend filtering, we impose a strict structure to the model and achieve much higher sample efficiency. We demonstrate the advantages of our model through a series of benchmarks on both synthetic and real-world datasets, in some cases reducing the error by nearly half in comparison to other popular methods in the literature. All of our models are implemented in Tensorflow and publicly available at https://github.com/tansey/sdp .


Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach

arXiv.org Machine Learning

Tree ensembles, such as random forests and boosted trees, are renowned for their high prediction performance. However, their interpretability is critically limited due to the enormous complexity. In this study, we present a method to make a complex tree ensemble interpretable by simplifying the model. Specifically, we formalize the simplification of tree ensembles as a model selection problem. Given a complex tree ensemble, we aim at obtaining the simplest representation that is essentially equivalent to the original one. To this end, we derive a Bayesian model selection algorithm that optimizes the simplified model while maintaining the prediction performance. Our numerical experiments on several datasets showed that complicated tree ensembles were reasonably approximated as interpretable.


Dynamic Repositioning to Reduce Lost Demand in Bike Sharing Systems

Journal of Artificial Intelligence Research

Bike Sharing Systems (BSSs) are widely adopted in major cities of the world due to concerns associated with extensive private vehicle usage, namely, increased carbon emissions, traffic congestion and usage of nonrenewable resources. In a BSS, base stations are strategically placed throughout a city and each station is stocked with a pre-determined number of bikes at the beginning of the day. Customers hire the bikes from one station and return them at another station. Due to unpredictable movements of customers hiring bikes, there is either congestion (more than required) or starvation (fewer than required) of bikes at base stations. Existing data has shown that congestion/starvation is a common phenomenon that leads to a large number of unsatisfied customers resulting in a significant loss in customer demand. In order to tackle this problem, we propose an optimisation formulation to reposition bikes using vehicles while also considering the routes for vehicles and future expected demand. Furthermore, we contribute two approaches that rely on decomposability in the problem (bike repositioning and vehicle routing) and aggregation of base stations to reduce the computation time significantly. Finally, we demonstrate the utility of our approach by comparing against two benchmark approaches on two real-world data sets of bike sharing systems. These approaches are evaluated using a simulation where the movements of customers are generated from real-world data sets.


Scalable and Distributed Clustering via Lightweight Coresets

arXiv.org Machine Learning

Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of coresets called lightweight coresets that allows for both multiplicative and additive errors. We provide a single algorithm to construct light-weight coresets for k-Means clustering, Bregman clustering and maximum likelihood estimation of Gaussian mixture models. The algorithm is substantially faster than existing constructions, embarrassingly parallel and resulting coresets are smaller. In an extensive experimental evaluation, we demonstrate that the proposed method outperforms existing coreset constructions.


Learning in Implicit Generative Models

arXiv.org Machine Learning

Generative adversarial networks (GANs) provide an algorithmic framework for constructing generative models with several appealing properties: they do not require a likelihood function to be specified, only a generating procedure; they provide samples that are sharp and compelling; and they allow us to harness our knowledge of building highly accurate neural network classifiers. Here, we develop our understanding of GANs with the aim of forming a rich view of this growing area of machine learning---to build connections to the diverse set of statistical thinking on this topic, of which much can be gained by a mutual exchange of ideas. We frame GANs within the wider landscape of algorithms for learning in implicit generative models--models that only specify a stochastic procedure with which to generate data--and relate these ideas to modelling problems in related fields, such as econometrics and approximate Bayesian computation. We develop likelihood-free inference methods and highlight hypothesis testing as a principle for learning in implicit generative models, using which we are able to derive the objective function used by GANs, and many other related objectives. The testing viewpoint directs our focus to the general problem of density ratio estimation. There are four approaches for density ratio estimation, one of which is a solution using classifiers to distinguish real from generated data. Other approaches such as divergence minimisation and moment matching have also been explored in the GAN literature, and we synthesise these views to form an understanding in terms of the relationships between them and the wider literature, highlighting avenues for future exploration and cross-pollination.