Olson, Matthew A., Wyner, Abraham J.

A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a certain class. In this paper, we forge a connection between random forests and kernel regression. This places random forest probability estimation on more sound statistical footing. As part of our investigation, we develop a model for the proximity kernel and relate it to the geometry and sparsity of the estimation problem. We also provide intuition and recommendations for tuning a random forest to improve its probability estimates.

IIowcver, accuracy completely ignores probability estimations produced by classifiers. As many real-world applications require probability estimations or ranking, accuracy is not sufficient in measuring and comparing classifiers. As the true ranking of training examples (Cohen et al., 1999) is often unknown, given training and testing examples with only class labels, we need a better measure for classifiers that produce scores for ranking. The area under the ROC (Receiver Operating Charaeteristics) curve, or simply AUC, has been shown as one of the measures for the quality of ranking (Bradley, 1997; Ling et al., 2003; tland & Till, 2001).

We present a unified logical framework for representing and reasoning about both probability quantitative and qualitative preferences in probability answer set programming, called probability answer set optimization programs. The proposed framework is vital to allow defining probability quantitative preferences over the possible outcomes of qualitative preferences. We show the application of probability answer set optimization programs to a variant of the well-known nurse restoring problem, called the nurse restoring with probability preferences problem. To the best of our knowledge, this development is the first to consider a logical framework for reasoning about probability quantitative preferences, in general, and reasoning about both probability quantitative and qualitative preferences in particular.

Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world. Handling uncertainty is typically described using everyday words like chance, luck, and risk. Probability is a field of mathematics that gives us the language and tools to quantify the uncertainty of events and reason in a principled manner. In this post, you will discover a gentle introduction to probability. Photo by Emma Jane Hogbin Westby, some rights reserved.

Bayesian inference is a way to get sharper predictions from your data. It's particularly useful when you don't have as much data as you would like and want to juice every last bit of predictive strength from it. Although it is sometimes described with reverence, Bayesian inference isn't magic or mystical. And even though the math under the hood can get dense, the concepts behind it are completely accessible. In brief, Bayesian inference lets you draw stronger conclusions from your data by folding in what you already know about the answer. Bayesian inference is based on the ideas of Thomas Bayes, a nonconformist Presbyterian minister in London about 300 years ago. He wrote two books, one on theology, and one on probability. His work included his now famous Bayes Theorem in raw form, which has since been applied to the problem of inference, the technical term for educated guessing. The popularity of Bayes' ideas was aided immeasurably by another minister, Richard Price.