Bayesian Learning
Detecting and Diagnosing Incipient Building Faults Using Uncertainty Information from Deep Neural Networks
Jin, Baihong, Li, Dan, Srinivasan, Seshadhri, Ng, See-Kiong, Poolla, Kameshwar, Alberto~Sangiovanni-Vincentelli, null
Abstract--Early detection of incipient faults is of vital importance toreducing maintenance costs, saving energy, and enhancing occupant comfort in buildings. Popular supervised learning models such as deep neural networks are considered promising due to their ability to directly learn from labeled fault data; however, it is known that the performance of supervised learning approaches highly relies on the availability and quality of labeled training data. In Fault Detection and Diagnosis (FDD) applications, the lack of labeled incipient fault data has posed a major challenge to applying these supervised learning techniques to commercial buildings. To overcome this challenge, this paper proposes using Monte Carlo dropout (MCdropout) to enhance the supervised learning pipeline, so that the resulting neural network is able to detect and diagnose unseen incipient fault examples. We also examine the proposed MCdropout method on the RP-1043 dataset to demonstrate its effectiveness in indicating the most likely incipient fault types. I. INTRODUCTION Building faults whose impact are less perceivable and/or hinder regular operations are called soft faults [21], [32]. These soft faults, especially in their incipient phase, are hard to detect as their signatures are not generally obvious (due to their magnitudes) and are lurking under measurement/system noise or feedback control actions [10], [27]. Nevertheless, they will impact energy consumption, system performance, and maintenance costs adversely in the long-run if left undetected and unattended [14].
Nowcasting Recessions using the SVM Machine Learning Algorithm
James, Alexander, Abu-Mostafa, Yaser S., Qiao, Xiao
Recessions reflect great dislocation in the economy and are often the source of societal anxiety. During a recession, unemployment is usually higher, and output is lower. Accurately identifying turning points from expansions to recessions has broad use for policymakers, business executives, academics, and individuals. Additionally, investors with enough resources to use this information in their investment process may change their portfolios as the economy turns from growth to contraction. There have been several attempts in the literature to accurately predict the timing of recessions.
On resampling vs. adjusting probabilistic graphical models in estimation of distribution algorithms
Yafrani, Mohamed El, Martins, Marcella S. R., Delgado, Myriam R. B. S., Sung, Inkyung, Lรผders, Ricardo, Wagner, Markus
The Bayesian Optimisation Algorithm (BOA) is an Estimation of Distribution Algorithm (EDA) that uses a Bayesian network as probabilistic graphical model (PGM). Determining the optimal Bayesian network structure given a solution sample is an NP-hard problem. This step should be completed at each iteration of BOA, resulting in a very time-consuming process. For this reason most implementations use greedy estimation algorithms such as K2. However, we show in this paper that significant changes in PGM structure do not occur so frequently, and can be particularly sparse at the end of evolution. A statistical study of BOA is thus presented to characterise a pattern of PGM adjustments that can be used as a guide to reduce the frequency of PGM updates during the evolutionary process. This is accomplished by proposing a new BOA-based optimisation approach (FBOA) whose PGM is not updated at each iteration. This new approach avoids the computational burden usually found in the standard BOA. The results compare the performances of both algorithms on an NK-landscape optimisation problem using the correlation between the ruggedness and the expected runtime over enumerated instances. The experiments show that FBOA presents competitive results while significantly saving computational time.
Translation Insensitivity for Deep Convolutional Gaussian Processes
Dutordoir, Vincent, van der Wilk, Mark, Artemev, Artem, Tomczak, Marcin, Hensman, James
Deep learning has been at the foundation of large improvements in image classification. To improve the robustness of predictions, Bayesian approximations have been used to learn parameters in deep neural networks. We follow an alternative approach, by using Gaussian processes as building blocks for Bayesian deep learning models, which has recently become viable due to advances in inference for convolutional and deep structure. We investigate deep convolutional Gaussian processes, and identify a problem that holds back current performance. To remedy the issue, we introduce a translation insensitive convolutional kernel, which removes the restriction of requiring identical outputs for identical patch inputs. We show empirically that this convolutional kernel improves performances in both shallow and deep models. On MNIST, FASHION-MNIST and CIFAR-10 we improve previous GP models in terms of accuracy, with the addition of having more calibrated predictive probabilities than simple DNN models.
Asymptotically exact data augmentation: models, properties and algorithms
Vono, Maxime, Dobigeon, Nicolas, Chainais, Pierre
Data augmentation, by the introduction of auxiliary variables, has become an ubiquitous technique to improve mixing/convergence properties, simplify the implementation or reduce the computational time of inference methods such as Markov chain Monte Carlo. Nonetheless, introducing appropriate auxiliary variables while preserving the initial target probability distribution cannot be conducted in a systematic way but highly depends on the considered problem. To deal with such issues, this paper draws a unified framework, namely asymptotically exact data augmentation (AXDA), which encompasses several well-established but also more recent approximate augmented models. Benefiting from a much more general perspective, it delivers some additional qualitative and quantitative insights concerning these schemes. In particular, general properties of AXDA along with non-asymptotic theoretical results on the approximation that is made are stated. Close connections to existing Bayesian methods (e.g. mixture modeling, robust Bayesian models and approximate Bayesian computation) are also drawn. All the results are illustrated with examples and applied to standard statistical learning problems.
Efficient Deep Learning of GMMs
Jalali, Shirin, Nuzman, Carl, Saniee, Iraj
We show that a collection of Gaussian mixture models (GMMs) in $R^{n}$ can be optimally classified using $O(n)$ neurons in a neural network with two hidden layers (deep neural network), whereas in contrast, a neural network with a single hidden layer (shallow neural network) would require at least $O(\exp(n))$ neurons or possibly exponentially large coefficients. Given the universality of the Gaussian distribution in the feature spaces of data, e.g., in speech, image and text, our result sheds light on the observed efficiency of deep neural networks in practical classification problems.
Readings in Medical Artificial Intelligence: The First Decade
A survey of early work exploring how AI can be used in medicine, with somewhat more technical expositions than in the complementary volume Artificial Intelligence in Medicine."Each chapter is preceded by a brief introduction that outlines our view of its contribution to the field, the reason it was selected for inclusion in this volume, an overview of its content, and a discussion of how the work evolved after the article appeared and how it relates to other chapters in the book.
A Probabilistic framework for Quantum Clustering
Casaรฑa-Eslava, Raรบl V., Lisboa, Paulo J. G., Ortega-Martorell, Sandra, Jarman, Ian H., Martรญn-Guerrero, Josรฉ D.
Quantum Clustering is a powerful method to detect clusters in data with mixed density. However, it is very sensitive to a length parameter that is inherent to the Schr\"odinger equation. In addition, linking data points into clusters requires local estimates of covariance that are also controlled by length parameters. This raises the question of how to adjust the control parameters of the Schr\"odinger equation for optimal clustering. We propose a probabilistic framework that provides an objective function for the goodness-of-fit to the data, enabling the control parameters to be optimised within a Bayesian framework. This naturally yields probabilities of cluster membership and data partitions with specific numbers of clusters. The proposed framework is tested on real and synthetic data sets, assessing its validity by measuring concordance with known data structure by means of the Jaccard score (JS). This work also proposes an objective way to measure performance in unsupervised learning that correlates very well with JS.
Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project
Artificial intelligence, or AI, is largely an experimental scienceโat least as much progress has been made by building and analyzing programs as by examining theoretical questions. MYCIN is one of several well-known programs that embody some intelligence and provide data on the extent to which intelligent behavior can be programmed. As with other AI programs, its development was slow and not always in a forward direction. But we feel we learned some useful lessons in the course of nearly a decade of work on MYCIN and related programs. In this book we share the results of many experiments performed in that time, and we try to paint a coherent picture of the work. The book is intended to be a critical analysis of several pieces of related research, performed by a large number of scientists. We believe that the whole field of AI will benefit from such attempts to take a detailed retrospective look at experiments, for in this way the scientific foundations of the field will gradually be defined. It is for all these reasons that we have prepared this analysis of the MYCIN experiments.
On the Convergence of Extended Variational Inference for Non-Gaussian Statistical Models
Ma, Zhanyu, Taghia, Jalil, Guo, Jun
Variational inference (VI) is a widely used framework in Bayesian estimation. For most of the non-Gaussian statistical models, it is infeasible to find an analytically tractable solution to estimate the posterior distributions of the parameters. Recently, an improved framework, namely the extended variational inference (EVI), has been introduced and applied to derive analytically tractable solution by employing lower-bound approximation to the variational objective function. Two conditions required for EVI implementation, namely the weak condition and the strong condition, are discussed and compared in this paper. In practical implementation, the convergence of the EVI depends on the selection of the lower-bound approximation, no matter with the weak condition or the strong condition. In general, two approximation strategies, the single lower-bound (SLB) approximation and the multiple lower-bounds (MLB) approximation, can be applied to carry out the lower-bound approximation. To clarify the differences between the SLB and the MLB, we will also discuss the convergence properties of the aforementioned two approximations. Extensive comparisons are made based on some existing EVI-based non-Gaussian statistical models. Theoretical analysis are conducted to demonstrate the differences between the weak and the strong conditions. Qualitative and quantitative experimental results are presented to show the advantages of the SLB approximation.