Bayesian Learning
The Logical Essentials of Bayesian Reasoning
This chapter offers an accessible introduction to the channel-based approach to Bayesian probability theory. This framework rests on algebraic and logical foundations, inspired by the methodologies of programming language semantics. It offers a uniform, structured and expressive language for describing Bayesian phenomena in terms of familiar programming concepts, like channel, predicate transformation and state transformation. The introduction also covers inference in Bayesian networks, which will be modelled by a suitable calculus of string diagrams.
Auto-Detection of Safety Issues in Baby Products
Bleaney, Graham, Kuzyk, Matthew, Man, Julian, Mayanloo, Hossein, Tizhoosh, H. R.
Every year, thousands of people receive consumer product related injuries. Research indicates that online customer reviews can be processed to autonomously identify product safety issues. Early identification of safety issues can lead to earlier recalls, and thus fewer injuries and deaths. A dataset of product reviews from Amazon.com was compiled, along with \emph{SaferProducts.gov} complaints and recall descriptions from the Consumer Product Safety Commission (CPSC) and European Commission Rapid Alert system. A system was built to clean the collected text and to extract relevant features. Dimensionality reduction was performed by computing feature relevance through a Random Forest and discarding features with low information gain. Various classifiers were analyzed, including Logistic Regression, SVMs, Na{\"i}ve-Bayes, Random Forests, and an Ensemble classifier. Experimentation with various features and classifier combinations resulted in a logistic regression model with 70.2\% precision in the top 50 reviews surfaced. This classifier outperforms all benchmarks set by related literature and consumer product safety professionals.
Negative Log Likelihood Ratio Loss for Deep Neural Network Classification
Zhu, Donglai, Yao, Hengshuai, Jiang, Bei, Yu, Peng
Deep neural network (DNN) has achieved remarkable success in classification tasks such as image classification [1]. The network output can mimic the posterior probabilities of target classes for the input observation when the nonlinear activation function in the output layer is defined as a soft-max function [2]. The learning objective is to minimize the difference between the predicted distribution and the true datagenerating distribution. In information theory, the cross entropy between two probability distributions over a common event set of events measures the average number of bits needed to identify an event if coding follows a learned probability distribution rather than the true but unknow distribution [3]. Therefore, cross entropy is a reasonable loss function for the DNN-based classification. However, in practice the true data-generating probability distribution is unknown and replaced by the empirical probability distribution over a training set where each sample is drawn independently and identically distributed (i.i.d.) from the data space [4]. Under assumptions of uniform distributions of feature and label spaces, minimizing cross-entropy is equivalent to maximum likelihood, i.e., the learning problem aims to maximize likelihood of correct class for each of training samples [2]. Maximum likelihood is a generative training criterion by which the model learns the likelihood of correct class for the observation. The model makes predictions by using Bayes rules to calculate posterior probabilities of target classes for the observation and then select the most likely class.
Offline Evaluation of Ranking Policies with Click Models
Li, Shuai, Abbasi-Yadkori, Yasin, Kveton, Branislav, Muthukrishnan, S., Vinay, Vishwa, Wen, Zheng
Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing new evaluation algorithms for estimating the expected number of clicks on ranked lists from stored logs of past results. The existing algorithms are not guaranteed to be statistically efficient in our problem because the number of recommended lists can grow exponentially with their length. To overcome this challenge, we use models of user interaction with the list of items, the so-called click models, to construct estimators that learn statistically efficiently. We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds. We evaluate our estimators in a series of experiments on a real-world dataset and show that they consistently outperform prior estimators.
High-dimensional Penalty Selection via Minimum Description Length Principle
Miyaguchi, Kohei, Yamanishi, Kenji
We tackle the problem of penalty selection of regularization on the basis of the minimum description length (MDL) principle. In particular, we consider that the design space of the penalty function is high-dimensional. In this situation, the luckiness-normalized-maximum-likelihood(LNML)-minimization approach is favorable, because LNML quantifies the goodness of regularized models with any forms of penalty functions in view of the minimum description length principle, and guides us to a good penalty function through the high-dimensional space. However, the minimization of LNML entails two major challenges: 1) the computation of the normalizing factor of LNML and 2) its minimization in high-dimensional spaces. In this paper, we present a novel regularization selection method (MDL-RS), in which a tight upper bound of LNML (uLNML) is minimized with local convergence guarantee. Our main contribution is the derivation of uLNML, which is a uniform-gap upper bound of LNML in an analytic expression. This solves the above challenges in an approximate manner because it allows us to accurately approximate LNML and then efficiently minimize it. The experimental results show that MDL-RS improves the generalization performance of regularized estimates specifically when the model has redundant parameters.
Weak Labeling for Crowd Learning
Beรฑaran-Muรฑoz, Iker, Hernรกndez-Gonzรกlez, Jerรณnimo, Pรฉrez, Aritz
Crowdsourcing has become very popular among the machine learning community as a way to obtain labels that allow a ground truth to be estimated for a given dataset. In most of the approaches that use crowdsourced labels, annotators are asked to provide, for each presented instance, a single class label. Such a request could be inefficient, that is, considering that the labelers may not be experts, that way to proceed could fail to take real advantage of the knowledge of the labelers. In this paper, the use of weak labeling for crowd learning is proposed, where the annotators may provide more than a single label per instance to try not to miss the real label. The main hypothesis is that, by allowing weak labeling, knowledge can be extracted from the labelers more efficiently by than in the standard crowd learning scenario. Empirical evidence which supports that hypothesis is presented.
Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles
Klein, John, Albardan, Mahmoud, Guedj, Benjamin, Colot, Olivier
We examine a network of learners which address the same classification task but must learn from different data sets. The learners can share a limited portion of their data sets so as to preserve the network load. We introduce DELCO (standing for Decentralized Ensemble Learning with COpulas), a new approach in which the shared data and the trained models are sent to a central machine that allows to build an ensemble of classifiers. The proposed method aggregates the base classifiers using a probabilistic model relying on Gaussian copulas. Experiments on logistic regressor ensembles demonstrate competing accuracy and increased robustness as compared to gold standard approaches. A companion python implementation can be downloaded at https://github.com/john-klein/DELCO
Generative Model for Heterogeneous Inference
Zhou, Honggang, Li, Yunchun, Yang, Hailong, Li, Wei, Jia, Jie
Generative models (GMs) such as Generative Adversary Network (GAN) and Variational Auto-Encoder (VAE) have thrived these years and achieved high quality results in generating new samples. Especially in Computer Vision, GMs have been used in image inpainting, denoising and completion, which can be treated as the inference from observed pixels to corrupted pixels. However, images are hierarchically structured which are quite different from many real-world inference scenarios with non-hierarchical features. These inference scenarios contain heterogeneous stochastic variables and irregular mutual dependences. Traditionally they are modeled by Bayesian Network (BN). However, the learning and inference of BN model are NP-hard thus the number of stochastic variables in BN is highly constrained. In this paper, we adapt typical GMs to enable heterogeneous learning and inference in polynomial time.We also propose an extended autoregressive (EAR) model and an EAR with adversary loss (EARA) model and give theoretical results on their effectiveness. Experiments on several BN datasets show that our proposed EAR model achieves the best performance in most cases compared to other GMs. Except for black box analysis, we've also done a serial of experiments on Markov border inference of GMs for white box analysis and give theoretical results.
Improved Classification Based on Deep Belief Networks
For better classification generative models are used to initialize the model and model features before training a classifier. Typically it is needed to solve separate unsupervised and supervised learning problems. Generative restricted Boltzmann machines and deep belief networks are widely used for unsupervised learning. We developed several supervised models based on DBN in order to improve this two-phase strategy. Modifying the loss function to account for expectation with respect to the underlying generative model, introducing weight bounds, and multi-level programming are applied in model development. The proposed models capture both unsupervised and supervised objectives effectively. The computational study verifies that our models perform better than the two-phase training approach.
The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression
Candes, Emmanuel J., Sur, Pragya
This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase transition'. We introduce an explicit boundary curve $h_{\text{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes $n$ and number of features $p$ proportioned in such a way that $p/n \rightarrow \kappa$, we show that if the problem is sufficiently high dimensional in the sense that $\kappa > h_{\text{MLE}}$, then the MLE does not exist with probability one. Conversely, if $\kappa < h_{\text{MLE}}$, the MLE asymptotically exists with probability one.