Bayesian Learning
A Comparative Study between Bayesian and Frequentist Neural Networks for Remaining Useful Life Estimation in Condition-Based Maintenance
In the last decade, deep learning (DL) has outperformed model-based and statistical approaches in predicting the remaining useful life (RUL) of machinery in the context of condition-based maintenance. One of the major drawbacks of DL is that it heavily depends on a large amount of labeled data, which are typically expensive and time-consuming to obtain, especially in industrial applications. Scarce training data lead to uncertain estimates of the model's parameters, which in turn result in poor prognostic performance. Quantifying this parameter uncertainty is important in order to determine how reliable the prediction is. Traditional DL techniques such as neural networks are incapable of capturing the uncertainty in the training data, thus they are overconfident about their estimates. On the contrary, Bayesian deep learning has recently emerged as a promising solution to account for uncertainty in the training process, achieving state-of-the-art performance in many classification and regression tasks. In this work Bayesian DL techniques such as Bayesian dense neural networks and Bayesian convolutional neural networks are applied to RUL estimation and compared to their frequentist counterparts from the literature. The effectiveness of the proposed models is verified on the popular C-MAPSS dataset. Furthermore, parameter uncertainty is quantified and used to gain additional insight into the data.
A Bayesian/Information Theoretic Model of Bias Learning
In this paper the problem of learning appropriate bias for an environment of related tasks is examined from a Bayesian perspective. The environment of related tasks is shown to be naturally modelled by the concept of an {\em objective} prior distribution. Sampling from the objective prior corresponds to sampling different learning tasks from the environment. It is argued that for many common machine learning problems, although we don't know the true (objective) prior for the problem, we do have some idea of a set of possible priors to which the true prior belongs. It is shown that under these circumstances a learner can use Bayesian inference to learn the true prior by sampling from the objective prior. Bounds are given on the amount of information required to learn a task when it is simultaneously learnt with several other tasks. The bounds show that if the learner has little knowledge of the true prior, and the dimensionality of the true prior is small, then sampling multiple tasks is highly advantageous.
SDGM: Sparse Bayesian Classifier Based on a Discriminative Gaussian Mixture Model
Hayashi, Hideaki, Uchida, Seiichi
A BSTRACT In probabilistic classification, a discriminative model based on Gaussian mixture exhibits flexible fitting capability. Nevertheless, it is difficult to determine the number of components. We propose a sparse classifier based on a discriminative Gaussian mixture model (GMM), which is named sparse discriminative Gaussian mixture (SDGM). In the SDGM, a GMM-based discriminative model is trained by sparse Bayesian learning. This learning algorithm improves the generalization capability by obtaining a sparse solution and automatically determines the number of components by removing redundant components. The SDGM can be embedded into neural networks (NNs) such as convolutional NNs and can be trained in an end-to-end manner. Experimental results indicated that the proposed method prevented overfitting by obtaining sparsity. Furthermore, we demonstrated that the proposed method outperformed a fully connected layer with the softmax function in certain cases when it was used as the last layer of a deep NN. 1 I NTRODUCTION In supervised classification, probabilistic classification is an approach that assigns a class label c to an input sample x by estimating the posterior probability P (c x).
Privacy and Utility Preserving Sensor-Data Transformations
Malekzadeh, Mohammad, Clegg, Richard G., Cavallaro, Andrea, Haddadi, Hamed
Queen Mary University of London, Imperial College LondonAbstract Sensitive inferences and user re-identification are major threats to privacy when raw sensor data from wearable or portable devices are shared with cloud-assisted applications. To mitigate these threats, we propose mechanisms to transform sensor data before sharing them with applications running on users' devices. These transformations aim at eliminating patterns that can be used for user re-identification or for inferring potentially sensitive activities, while introducing a minor utility loss for the target application (or task). We show that, on gesture and activity recognition tasks, we can prevent inference of potentially sensitive activities while keeping the reduction in recognition accuracy of nonsensitive activities to less than 5 percentage points. We also show that we can reduce the accuracy of user re-identification and of the potential inference of gender to the level of a random guess, while keeping the accuracy of activity recognition comparable to that obtained on the original data.1. Introduction Sensors such as accelerometer, gyroscope, and magnetometer, embedded in personal smart devices generate data that can be used to monitor users' activities, interactions, and mood [1, 2, 3]. Applications (apps) installed on smart devices can get access to raw sensor data to make required(i.e. However, sensor data can also facilitate some potentially sensitive ( i.e. undesired) inferences that a user might wish to keep private, such as discovering smoking habits [4] or revealing personal attributes such as age and gender [5]. Some patterns in raw sensor data may also enable user re-identification [6]. Information privacy can be defined as "the right to select what personal information about me is known to what people" [7].
Kriging: Beyond Mat\'ern
The Mat\'ern covariance function is a popular choice for prediction in spatial statistics and uncertainty quantification literature. A key benefit of the Mat\'ern class is that it is possible to get precise control over the degree of differentiability of the process realizations. However, the Mat\'ern class possesses exponentially decaying tails, and thus may not be suitable for modeling long range dependence. This problem can be remedied using polynomial covariances; however one loses control over the degree of differentiability of the process realizations, in that the realizations using polynomial covariances are either infinitely differentiable or not differentiable at all. We construct a new family of covariance functions using a scale mixture representation of the Mat\'ern class where one obtains the benefits of both Mat\'ern and polynomial covariances. The resultant covariance contains two parameters: one controls the degree of differentiability near the origin and the other controls the tail heaviness, independently of each other. Using a spectral representation, we derive theoretical properties of this new covariance including equivalence measures and asymptotic behavior of the maximum likelihood estimators under infill asymptotics. The improved theoretical properties in predictive performance of this new covariance class are verified via extensive simulations. Application using NASA's Orbiting Carbon Observatory-2 satellite data confirms the advantage of this new covariance class over the Mat\'ern class, especially in extrapolative settings.
Bayesian Optimization with Uncertain Preferences over Attributes
Astudillo, Raul, Frazier, Peter I.
We consider black-box global optimization of time-consuming-to-evaluate functions on behalf of a decision-maker whose preferences must be learned. Each feasible design is associated with a time-consuming-to-evaluate vector of attributes, each vector of attributes is assigned a utility by the decision-maker's utility function, and this utility function may be learned approximately using preferences expressed by the decision-maker over pairs of attribute vectors. Past work has used this estimated utility function as if it were error-free within single-objective optimization. However, errors in utility estimation may yield a poor suggested decision. Furthermore, this approach produces a single suggested "best" design, whereas decision-makers often prefer to choose among a menu of designs. We propose a novel Bayesian optimization algorithm that acknowledges the uncertainty in preference estimation and implicitly chooses designs to evaluate using the time-consuming function that are good not just for a single estimated utility function but a range of likely utility functions. Our algorithm then shows a menu of designs and evaluated attributes to the decision-maker who makes a final selection. We demonstrate the value of our algorithm in a variety of numerical experiments.
A Model of Double Descent for High-dimensional Binary Linear Classification
Deng, Zeyu, Kammoun, Abla, Thrampoulidis, Christos
We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples. The classifier is obtained by running gradient-descent (GD) on the logistic-loss. For this model, we investigate the dependence of the generalization error on the overparameterization ratio $\kappa=p/n$. First, building on known deterministic results on convergence properties of the GD, we uncover a phase-transition phenomenon for the case of Gaussian regressors: the generalization error of GD is the same as that of the maximum-likelihood (ML) solution when $\kappa<\kappa_\star$, and that of the max-margin (SVM) solution when $\kappa>\kappa_\star$. Next, using the convex Gaussian min-max theorem (CGMT), we sharply characterize the performance of both the ML and SVM solutions. Combining these results, we obtain curves that explicitly characterize the generalization error of GD for varying values of $\kappa$. The numerical results validate the theoretical predictions and unveil double-descent phenomena that complement similar recent observations in linear regression settings.
Streaming Bayesian Inference for Crowdsourced Classification
Manino, Edoardo, Tran-Thanh, Long, Jennings, Nicholas R.
A key challenge in crowdsourcing is inferring the ground truth from noisy and unreliable data. To do so, existing approaches rely on collecting redundant information from the crowd, and aggregating it with some probabilistic method. However, oftentimes such methods are computationally inefficient, are restricted to some specific settings, or lack theoretical guarantees. In this paper, we revisit the problem of binary classification from crowdsourced data. Specifically we propose Streaming Bayesian Inference for Crowdsourcing (SBIC), a new algorithm that does not suffer from any of these limitations. First, SBIC has low complexity and can be used in a real-time online setting. Second, SBIC has the same accuracy as the best state-of-the-art algorithms in all settings. Third, SBIC has provable asymptotic guarantees both in the online and offline settings.
Anomaly Detection in Large Scale Networks with Latent Space Models
Lee, Wesley, McCormick, Tyler H., Neil, Joshua, Sodja, Cole
We develop a real-time anomaly detection algorithm for directed activity on large, sparse networks. We model the propensity for future activity using a dynamic logistic model with interaction terms for sender- and receiver-specific latent factors in addition to sender- and receiver-specific popularity scores; deviations from this underlying model constitute potential anomalies. Latent nodal attributes are estimated via a variational Bayesian approach and may change over time, representing natural shifts in network activity. Estimation is augmented with a case-control approximation to take advantage of the sparsity of the network and reduces computational complexity from $O(N^2)$ to $O(E)$, where $N$ is the number of nodes and $E$ is the number of observed edges. We run our algorithm on network event records collected from an enterprise network of over 25,000 computers and are able to identify a red team attack with half the detection rate required of the model without latent interaction terms.
Artificial Intelligence vs. Machine Learning vs. Deep Learning
Now that we now better understand what Artificial Intelligence means we can take a closer look at Machine Learning and Deep Learning and make a clearer distinguishment between these two. Machine Learning incorporates " classical" algorithms for various kinds of tasks such as clustering, regression or classification. Machine Learning algorithms must be trained on data. The more data you provide to your algorithm, the better it gets. The "training" part of a Machine Learning model means that this model tries to optimize along a certain dimension.