Bayesian Learning
Estimating Real Log Canonical Thresholds
Evaluation of the marginal likelihood plays an important role in model selection problems. The widely applicable Bayesian information criterion (WBIC) and singular Bayesian information criterion (sBIC) give approximations to the log marginal likelihood, which can be applied to both regular and singular models. When the real log canonical thresholds are known, the performance of sBIC is considered to be better than that of WBIC, but only few real log canonical thresholds are known. In this paper, we propose a new estimator of the real log canonical thresholds based on the variance of thermodynamic integration with an inverse temperature. In addition, we propose an application to make sBIC widely applicable. Finally, we investigate the performance of the estimator and model selection by simulation studies and application to real data.
Bayes Theorem: A Primer - Lavanya.ai
Imagine you're sleeping, and you hear strange noises in your front lawn. You're very sleepy, so you hypothesize that the strange noises are being generated by a hungry dinosaur. You think to yourself, 'this is exactly what I would hear if there was a dinosaur outside in my front lawn'. But then as you think more about it, you realize that the likelihood of there actually being a dinosaur in your front lawn is extremely low; whereas the likelihood of hearing strange noises from the front lawn is likely pretty high. So you exhale as you realize that the actual probability of there being a dinosaur in your front lawn, aka your original hypothesis, given the evidence is extremely low.
Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel
Qiu, Xin, Meyerson, Elliot, Miikkulainen, Risto
Neural Networks (NNs) have been extensively used for a wide spectrum of real-world regression tasks, where the goal is to predict a numerical outcome such as revenue, effectiveness, or a quantitative result. In many such tasks, the point prediction is not enough, but also the uncertainty (i.e. risk, or confidence) of that prediction must be estimated. Standard NNs, which are most often used in such tasks, do not provide any such information. Existing approaches try to solve this issue by combining Bayesian models with NNs, but these models are hard to implement, more expensive to train, and usually do not perform as well as standard NNs. In this paper, a new framework called RIO is developed that makes it possible to estimate uncertainty in any pretrained standard NN. RIO models prediction residuals using Gaussian Process with a composite input/output kernel. The residual prediction and I/O kernel are theoretically motivated and the framework is evaluated in twelve real-world datasets. It is found to provide reliable estimates of the uncertainty, reduce the error of the point predictions, and scale well to large datasets. Given that RIO can be applied to any standard NN without modifications to model architecture or training pipeline, it provides an important ingredient in building real-world applications of NNs.
MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning
Granziol, Diego, Ru, Binxin, Zohren, Stefan, Doing, Xiaowen, Osborne, Michael, Roberts, Stephen
Making high quality inference on large, feature rich datasets under a constrained computational budget is arguably the primary goal of the learning community. This, however, comes with significant challenges. On the one hand, the exact computation of linear algebraic quantities may be prohibitively expensive, such as that of the log determinant. On the other hand, an analytic expression for the quantity of interest may not exist at all, such as the case for the entropy of a Gaussian mixture model, and approximate methods are often both inefficient and inaccurate.
Conditional Generative Models are not Robust
Fetaya, Ethan, Jacobsen, Jörn-Henrik, Zemel, Richard
Class-conditional generative models are an increasingly popular approach to achieve robust classification. They are a natural choice to solve discriminative tasks in a robust manner as they jointly optimize for predictive performance and accurate modeling of the input distribution. In this work, we investigate robust classification with likelihood-based conditional generative models from a theoretical and practical perspective. Our theoretical result reveals that it is impossible to guarantee detectability of adversarial examples even for near-optimal generative classifiers. Experimentally, we show that naively trained conditional generative models have poor discriminative performance, making them unsuitable for classification. This is related to overlooked issues with training conditional generative models and we show methods to improve performance. Finally, we analyze the robustness of our proposed conditional generative models on MNIST and CIFAR10. While we are able to train robust models for MNIST, robustness completely breaks down on CIFAR10. This lack of robustness is related to various undesirable model properties maximum likelihood fails to penalize. Our results indicate that likelihood may fundamentally be at odds with robust classification on challenging problems.
Bayesian Prior Networks with PAC Training
Haussmann, Manuel, Gerwinn, Sebastian, Kandemir, Melih
We propose to train Bayesian Neural Networks (BNNs) by empirical Bayes as an alternative to posterior weight inference. By approximately marginalizing out an i.i.d.\ realization of a finite number of sibling weights per data-point using the Central Limit Theorem (CLT), we attain a scalable and effective Bayesian deep predictor. This approach directly models the posterior predictive distribution, by-passing the intractable posterior weight inference step. However, it introduces a prohibitively large number of hyperparameters for stable training. As the prior weights are marginalized and hyperparameters are optimized, the model also no longer provides a means to incorporate prior knowledge. We overcome both of these drawbacks by deriving a trivial PAC bound that comprises the marginal likelihood of the predictor and a complexity penalty. The outcome integrates organically into the prior networks framework, bringing about an effective and holistic Bayesian treatment of prediction uncertainty. We observe on various regression, classification, and out-of-domain detection benchmarks that our scalable method provides an improved model fit accompanied with significantly better uncertainty estimates than the state-of-the-art.
The Computational Structure of Unintentional Meaning
Ho, Mark K., Korman, Joanna, Griffiths, Thomas L.
Speech-acts can have literal meaning as well as pragmatic meaning, but these both involve consequences typically intended by a speaker. Speech-acts can also have unintentional meaning, in which what is conveyed goes above and beyond what was intended. Here, we present a Bayesian analysis of how, to a listener, the meaning of an utterance can significantly differ from a speaker's intended meaning. Our model emphasizes how comprehending the intentional and unintentional meaning of speech-acts requires listeners to engage in sophisticated model-based perspective-taking and reasoning about the history of the state of the world, each other's actions, and each other's observations. To test our model, we have human participants make judgments about vignettes where speakers make utterances that could be interpreted as intentional insults or unintentional faux pas. In elucidating the mechanics of speech-acts with unintentional meanings, our account provides insight into how communication both functions and malfunctions.
On Privacy Protection of Latent Dirichlet Allocation Model Training
Zhao, Fangyuan, Ren, Xuebin, Yang, Shusen, Yang, Xinyu
Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.
Radial-Based Undersampling for Imbalanced Data Classification
Data imbalance remains one of the most widespread problems affecting contemporary machine learning. The negative effect data imbalance can have on the traditional learning algorithms is most severe in combination with other dataset difficulty factors, such as small disjuncts, presence of outliers and insufficient number of training observations. Said difficulty factors can also limit the applicability of some of the methods of dealing with data imbalance, in particular the neighborhood-based oversampling algorithms based on SMOTE. Radial-Based Oversampling (RBO) was previously proposed to mitigate some of the limitations of the neighborhood-based methods. In this paper we examine the possibility of utilizing the concept of mutual class potential, used to guide the oversampling process in RBO, in the undersampling procedure. Conducted computational complexity analysis indicates a significantly reduced time complexity of the proposed Radial-Based Undersampling algorithm, and the results of the performed experimental study indicate its usefulness, especially on difficult datasets.
Inverse boosting pruning trees for depression detection on Twitter
Tong, Lei, Xiangrong, null, Zhang, Qianni, Sadka, Abdul, Li, Ling, Zhou, Huiyu
Depression is one of the most common mental health disorders, and a large number of depression people commit suicide each year. Potential depression sufferers do not consult psychological doctors because they feel ashamed or are unaware of any depression, which may result in severe delay of diagnosis and treatment. In the meantime, evidence shows that social media data provides valuable clues about physical and mental health conditions. In this paper, we argue that it is feasible to identify depression at an early stage by mining online social behaviours. Our approach, which is innovative to the practice of depression detection, does not rely on the extraction of numerous or complicated features to achieve accurate depression detection. Instead, we propose a novel classifier, namely, Inverse Boosting Pruning Trees (IBPT), which demonstrates a strong classification ability on a publicly accessible dataset with 7862 Twitter users. To comprehensively evaluate the classification capability of the IBPT, we use three real datasets from the UCI machine learning repository and the IBPT still obtains the best classification results against several state of the arts techniques. The results manifest that our proposed framework is promising for identifying social networks' users with depression.