Goto

Collaborating Authors

 class conditional probability


The Naive Bayes classifier: How it works

#artificialintelligence

Classification algorithms try to predict the class or the label of the categorical target variable. A categorical variable typically represents qualitative data that has discrete values, such as pass/fail or low/medium/high, etc. Out of the many classification algorithms, the Naïve Bayes classifier is one of the simplest classification algorithms. The Naïve Bayes classifier is often used with large text datasets among other applications. The aim of this article is to explain how the Naive Bayes algorithm works.


Understanding how to explain predictions with "explanation vectors"

#artificialintelligence

In a recent post I introduced three existing approaches to explain individual predictions of any machine learning model. After the posts focused on LIME and Shapley values, now it's the turn of Explanation vectors, a method presented by David Baehrens, Timon Schroeter and Stefan Harmeling in 2010. As we have seen in the mentioned posts, explaining a decision of a black box model implies understanding what input features made the model give its prediction for the observation being explained. Intuitively, a feature has a lot of influence on the model decision if small variations in its value cause large variations of the model's output, while a feature has little influence on the prediction if big changes in that variable barely affect the model's output. Since a model is a scalar function, its gradient points in the direction of the greatest rate of increase of the model's output, so it can be used as a measure of features' influence.


Multi-category Angle-based Classifier Refit

arXiv.org Machine Learning

Classification is an important statistical learning tool. In real application, besides high prediction accuracy, it is often desirable to estimate class conditional probabilities for new observations. For traditional problems where the number of observations is large, there exist many well developed approaches. Recently, high dimensional low sample size problems are becoming increasingly popular. Margin-based classifiers, such as logistic regression, are well established methods in the literature. On the other hand, in terms of probability estimation, it is known that for binary classifiers, the commonly used methods tend to under-estimate the norm of the classification function. This can lead to biased probability estimation. Remedy approaches have been proposed in the literature. However, for the simultaneous multicategory classification framework, much less work has been done. We fill the gap in this paper. In particular, we give theoretical insights on why heavy regularization terms are often needed in high dimensional applications, and how this can lead to bias in probability estimation. To overcome this difficulty, we propose a new refit strategy for multicategory angle-based classifiers. Our new method only adds a small computation cost to the problem, and is able to attain prediction accuracy that is as good as the regular margin-based classifiers. On the other hand, the improvement of probability estimation can be very significant. Numerical results suggest that the new refit approach is highly competitive.


Active Learning with Distributional Estimates

arXiv.org Machine Learning

Active Learning (AL) is increasingly important in a broad range of applications. Two main AL principles to obtain accurate classification with few labeled data are refinement of the current decision boundary and exploration of poorly sampled regions. In this paper we derive a novel AL scheme that balances these two principles in a natural way. In contrast to many AL strategies, which are based on an estimated class conditional probability ^p(y|x), a key component of our approach is to view this quantity as a random variable, hence explicitly considering the uncertainty in its estimated value. Our main contribution is a novel mathematical framework for uncertainty-based AL, and a corresponding AL scheme, where the uncertainty in ^p(y|x) is modeled by a second-order distribution. On the practical side, we show how to approximate such second-order distributions for kernel density classification. Finally, we find that over a large number of UCI, USPS and Caltech4 datasets, our AL scheme achieves significantly better learning curves than popular AL methods such as uncertainty sampling and error reduction sampling, when all use the same kernel density classifier.


An Adaptive Metric Machine for Pattern Classification

Neural Information Processing Systems

Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chi-squared distance analysis to compute a flexible metric for producing neighborhoods that are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of real world data. 1 Introduction


An Adaptive Metric Machine for Pattern Classification

Neural Information Processing Systems

Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chi-squared distance analysis to compute a flexible metric for producing neighborhoods that are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of real world data. 1 Introduction


An Adaptive Metric Machine for Pattern Classification

Neural Information Processing Systems

Nearest neighbor classification assumes locally constant class conditional probabilities.This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chi-squared distance analysis to compute a flexible metric for producing neighborhoodsthat are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods,whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of real world data. 1 Introduction