Uncertainty
Deep Bayesian Neural Networks. – Stefano Cosentino – Medium
Conventional neural networks aren't well designed to model the uncertainty associated with the predictions they make. For that, one way is to go full Bayesian. What are we trying to do? Any deep network has parameters, often in the form of weights (w_1, w_2, …) and biases (b_1,b_2, …). The conventional (non-Bayesian) way is to learn only the optimal values via maximum likelihood estimation.
Learning unknown ODE models with Gaussian processes
Heinonen, Markus, Yildiz, Cagatay, Mannerström, Henrik, Intosalmi, Jukka, Lähdesmäki, Harri
In conventional ODE modelling coefficients of an equation driving the system state forward in time are estimated. However, for many complex systems it is practically impossible to determine the equations or interactions governing the underlying dynamics. In these settings, parametric ODE model cannot be formulated. Here, we overcome this issue by introducing a novel paradigm of nonparametric ODE modelling that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We propose to learn non-linear, unknown differential functions from state observations using Gaussian process vector fields within the exact ODE formalism. We demonstrate the model's capabilities to infer dynamics from sparse data and to simulate the system forward into future.
Variational Inference for Gaussian Process with Panel Count Data
Ding, Hongyi, Lee, Young, Sato, Issei, Sugiyama, Masashi
We present the first framework for Gaussian-process-modulated Poisson processes when the temporal data appear in the form of panel counts. Panel count data frequently arise when experimental subjects are observed only at discrete time points and only the numbers of occurrences of the events between subsequent observation times are available. The exact occurrence timestamps of the events are unknown. The method of conducting the efficient variational inference is presented, based on the assumption of a Gaussian-process-modulated intensity function. We derive a tractable lower bound to alleviate the problems of the intractable evidence lower bound inherent in the variational inference framework. Our algorithm outperforms classical methods on both synthetic and three real panel count sets.
Exact and approximate inference in graphical models: variable elimination and beyond
Peyrard, Nathalie, Cros, Marie-Josée, de Givry, Simon, Franc, Alain, Robin, Stéphane, Sabbadin, Régis, Schiex, Thomas, Vignes, Matthieu
Probabilistic graphical models offer a powerful framework to account for the dependence structure between variables, which is represented as a graph. However, the dependence between variables may render inference tasks intractable. In this paper we review techniques exploiting the graph structure for exact inference, borrowed from optimisation and computer science. They are built on the principle of variable elimination whose complexity is dictated in an intricate way by the order in which variables are eliminated. The so-called treewidth of the graph characterises this algorithmic complexity: low-treewidth graphs can be processed efficiently. The first message that we illustrate is therefore the idea that for inference in graphical model, the number of variables is not the limiting factor, and it is worth checking for the treewidth before turning to approximate methods. We show how algorithms providing an upper bound of the treewidth can be exploited to derive a 'good' elimination order enabling to perform exact inference. The second message is that when the treewidth is too large, algorithms for approximate inference linked to the principle of variable elimination, such as loopy belief propagation and variational approaches, can lead to accurate results while being much less time consuming than Monte-Carlo approaches. We illustrate the techniques reviewed in this article on benchmarks of inference problems in genetic linkage analysis and computer vision, as well as on hidden variables restoration in coupled Hidden Markov Models.
Artificial Intelligence First - Disruption Hub
Although materially beneficial corporate deployments of AI are beginning to proliferate, the AI activities of the majority still amount to a few isolated pilot projects conceived in an ad-hoc basis. Organisations without a clear AI strategy – and that's most – run the risk of falling behind as other better organised industry players move forward. That said, while individual AI solutions can be transformative within the scope of their application, that's not as clear-cut an argument for front-to-back change as, say, the digital transformation of a high street retailer. Developing an AI strategy requires an exercise of careful discrimination – acknowledging the present limitations of AI as well as its strengths in order to identify where one can, cannot, or even should not exploit it. This article is about the'what' of an AI strategy rather than the equally important'how'.
Learning Binary Bayesian Networks in Polynomial Time and Sample Complexity
We consider the problem of structure learning for binary Bayesian networks. Our approach is to recover the true parents and children for each node first and then combine the results to recover the skeleton. We do not assume any specific probability distribution for the nodes. Rather, we show that if the probability distribution satisfies certain conditions then we can exactly recover the parents and children of a node by performing l1-regularized linear regression with sufficient number of samples. The sample complexity of our proposed approach depends logarithmically on the number of nodes in the Bayesian network. Furthermore, our method runs in polynomial time.
Modern Approaches for Sales Predictive Analytics
Sales prediction is an important part of modern business intelligence. First approaches one can apply to predict sales time series are such conventional methods of forecasting as ARIMA and Holt-Winters. But there are several challenges while using these methods. They are: multilevel daily/weekly/monthly/yearly seasonality, many exogenous factors which impact sales, complex trends in different time periods. In such cases, it is not easy to apply conventional methods.
Coordinating Measurements in Uncertain Participatory Sensing Settings
Zenonos, Alexandros, Stein, Sebastian, Jennings, Nicholas R.
Environmental monitoring allows authorities to understand the impact of potentially harmful phenomena, such as air pollution, excessive noise, and radiation. Recently, there has been considerable interest in participatory sensing as a paradigm for such large-scale data collection because it is cost-effective and able to capture more fine-grained data than traditional approaches that use stationary sensors scattered in cities. In this approach, ordinary citizens (non-expert contributors) collect environmental data using low-cost mobile devices. However, these participants are generally self-interested actors that have their own goals and make local decisions about when and where to take measurements. This can lead to highly inefficient outcomes, where observations are either taken redundantly or do not provide sufficient information about key areas of interest. To address these challenges, it is necessary to guide and to coordinate participants, so they take measurements when it is most informative. To this end, we develop a computationally-efficient coordination algorithm (adaptive Best-Match) that suggests to users when and where to take measurements. Our algorithm exploits probabilistic knowledge of human mobility patterns, but explicitly considers the uncertainty of these patterns and the potential unwillingness of people to take measurements when requested to do so. In particular, our algorithm uses a local search technique, clustering and random simulations to map participants to measurements that need to be taken in space and time. We empirically evaluate our algorithm on a real-world human mobility and air quality dataset and show that it outperforms the current state of the art by up to 24% in terms of utility gained.
Rough extreme learning machine: a new classification method based on uncertainty measure
Feng, Lin, Xu, Shuliang, Wang, Feilong, Liu, Shenglan
Extreme learning machine (ELM) is a new single hidden layer feedback neural network. The weights of the input layer and the biases of neurons in hidden layer are randomly generated, the weights of the output layer can be analytically determined. ELM has been achieved good results for a large number of classification tasks. In this paper, a new extreme learning machine called rough extreme learning machine (RELM) was proposed. RELM uses rough set to divide data into upper approximation set and lower approximation set, and the two approximation sets are utilized to train upper approximation neurons and lower approximation neurons. In addition, an attribute reduction is executed in this algorithm to remove redundant attributes. The experimental results showed, comparing with the comparison algorithms, RELM can get a better accuracy and repeatability in most cases, RELM can not only maintain the advantages of fast speed, but also effectively cope with the classification task for high-dimensional data.
Learning Large-Scale Bayesian Networks with the sparsebn Package
Aragam, Bryon, Gu, Jiaying, Zhou, Qing
The widespread growth of high-dimensional biological data in particular has spurred a renewed interest in the use of graphical models to aid in the discovery of novel biological mechanisms (Bühlmann, Kalisch, and Meier 2014). While the past decade has witnessed tremendous developments towards understanding undirected graphical models (Meinshausen and Bühlmann 2006; Ravikumar, Wainwright, and Lafferty 2010; Yang, Ravikumar, Allen, and Liu 2015), there has been less progress towards understanding directed graphical models--also known as Bayesian networks (BNs) or structural equation models (SEM)--for high-dimensional data with p n. A BN is represented by a directed acyclic graph (DAG), whose structure contains a richer and different set of conditional independence relations than an undirected graph. Moreover, DAGs are commonly used 2 Learning Large-Scale Bayesian Networks with the sparsebn Package in causal inference where the direction of an edge encodes causality. Consequently, there have been continuing efforts in structure learning of directed graphs from data.