# probability

### #MLMuse -- Naivety in Naive Bayes' Classifiers

Classifying our data and predicting the outcomes from our historical data are huge tasks at the moment. For performing these tasks, we have a robust family of Supervised Learning Algorithms called Naive Bayes' Classifiers. Naive Bayes' Classifiers are wholly based on the Bayes' Theorem which gives us the probability of an event, given that another event has already occurred. This is symbolically expressed as P(A B), i.e. Probability of event A will occur given that event B has already occurred.

### Comparing AUCs of Machine Learning Models with DeLong's Test

Have you ever wondered how to demonstrate that one machine learning model's test set performance differs significantly from the test set performance of an alternative model? This post will describe how to use DeLong's test to obtain a p-value for whether one model has a significantly different AUC than another model, where AUC refers to the area under the receiver operating characteristic. This post includes a hand-calculated example to illustrate all the steps in DeLong's test for a small data set. It also includes an example R implementation of DeLong's test to enable efficient calculation on large data sets. An example use case for DeLong's test: Model A predicts heart disease risk with AUC of 0.92, and Model B predicts heart disease risk with AUC of 0.87, and we use DeLong's test to demonstrate that Model A has a significantly different AUC from Model B with p 0.05.

### Artificial Intelligence (AI) And The Law: Helping Lawyers While Avoiding Biased Algorithms

Artificial intelligence (AI) has the potential to help every sector of the economy. There is a challenge, though, in sectors that have fuzzier analysis and the potential to train with data that can continue human biases. A couple of years ago, I described the problem with bias in an article about machine learning (ML) applied to criminal recidivism. It's worth revisiting the sector as time have changed in how bias is addressed. One way is to look at sectors in the legal profession where bias is a much smaller factor.

### A probabilistic population code based on neural samples

Sensory processing is often characterized as implementing probabilistic inference: networks of neurons compute posterior beliefs over unobserved causes given the sensory inputs. How these beliefs are computed and represented by neural responses is much-debated (Fiser et al. 2010, Pouget et al. 2013). A central debate concerns the question of whether neural responses represent samples of latent variables (Hoyer & Hyvarinnen 2003) or parameters of their distributions (Ma et al. 2006) with efforts being made to distinguish between them (Grabska-Barwinska et al. 2013). A separate debate addresses the question of whether neural responses are proportionally related to the encoded probabilities (Barlow 1969), or proportional to the logarithm of those probabilities (Jazayeri & Movshon 2006, Ma et al. 2006, Beck et al. 2012). Here, we show that these alternatives -- contrary to common assumptions -- are not mutually exclusive and that the very same system can be compatible with all of them.

### Reinforcement Learning by Probability Matching

Papers published at the Neural Information Processing Systems Conference.

### Evaluating probabilities under high-dimensional latent variable models

We present a simple new Monte Carlo algorithm for evaluating probabilities of observations in complex latent variable models, such as Deep Belief Networks. While the method is based on Markov chains, estimates based on short runs are formally unbiased. In expectation, the log probability of a test set will be underestimated, and this could form the basis of a probabilistic bound. The method is much cheaper than gold-standard annealing-based methods and only slightly more expensive than the cheapest Monte Carlo methods. We give examples of the new method substantially improving simple variational bounds at modest extra cost.

### Gated Softmax Classification

We describe a log-bilinear" model that computes class probabilities by combining an input vector multiplicatively with a vector of binary latent variables. Even though the latent variables can take on exponentially many possible combinations of values, we can efficiently compute the exact probability of each class by marginalizing over the latent variables. This makes it possible to get the exact gradient of the log likelihood. The bilinear score-functions are defined using a three-dimensional weight tensor, and we show that factorizing this tensor allows the model to encode invariances inherent in a task by learning a dictionary of invariant basis functions. Experiments on a set of benchmark problems show that this fully probabilistic model can achieve classification performance that is competitive with (kernel) SVMs, backpropagation, and deep belief nets."

### Multi-Stage Dantzig Selector

We consider the following sparse signal recovery (or feature selection) problem: given a design matrix $X\in \mathbb{R} {n\times m}$ $(m\gg n)$ and a noisy observation vector $y\in \mathbb{R} {n}$ satisfying $y X\beta * \epsilon$ where $\epsilon$ is the noise vector following a Gaussian distribution $N(0,\sigma 2I)$, how to recover the signal (or parameter vector) $\beta *$ when the signal is sparse? The Dantzig selector has been proposed for sparse signal recovery with strong theoretical guarantees. In this paper, we propose a multi-stage Dantzig selector method, which iteratively refines the target signal $\beta *$. The proposed method improves the estimation bound of the standard Dantzig selector approximately from $Cs {1/p}\sqrt{\log m}\sigma$ to $C(s-N) {1/p}\sqrt{\log m}\sigma$ where the value $N$ depends on the number of large entries in $\beta *$. When $N s$, the proposed algorithm achieves the oracle solution with a high probability.

### Joint Cascade Optimization Using A Product Of Boosted Classifiers

The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which can not be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade. We interpret the response of a classifier as a probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set.

### Evaluation of Rarity of Fingerprints in Forensics

A method for computing the rarity of latent fingerprints represented by minutiae is given. It allows determining the probability of finding a match for an evidence print in a database of n known prints. The probability of random correspondence between evidence and database is determined in three procedural steps. In the registration step the latent print is aligned by finding its core point; which is done using a procedure based on a machine learning approach based on Gaussian processes. In the evidence probability evaluation step a generative model based on Bayesian networks is used to determine the probability of the evidence; it takes into account both the dependency of each minutia on nearby minutiae and the confidence of their presence in the evidence.