"AI systems–like people–must often act despite partial and uncertain information. First, the information received may be unreliable (e.g., a patient may mis-remember when a disease started, or may not have noticed a symptom that is important to a diagnosis). In addition, rules connecting real-world events can never include all the factors that might determine whether their conclusions really apply (e.g., the correctness of basing a diagnosis on a lab test depends whether there were conditions that might have caused a false positive, on the test being done correctly, on the results being associated with the right patient, etc.) Thus in order to draw useful conclusions, AI systems must be able to reason about the probability of events, given their current knowledge."
– from David Leake, Reasoning Under Uncertainty
Alan Turing (1950) was one of the founders of modern computers and AI. The "Turing test" was based on the fact that the intelligent behavior of a computer is the ability to achieve human level performance in cognition related tasks. The 1980s and 1990s saw a surge in interest in AI. Artificial intelligent techniques such as fuzzy expert systems, Bayesian networks, artificial neural networks, and hybrid intelligent systems were used in different clinical settings in health care. In 2016, the biggest chunk of investments in AI research were in healthcare applications compared with other sectors. AI in medicine can be dichotomized into two subtypes: Virtual and physical.
This discussion on'The Abstractionism of probability" is perhaps one of the first in the world to be discussed publicly. It has to be understood that, this discussion has evolved out of various other discussions with mathematicians, philosophers, doctors and engineers and many other participants including rappers, mainstream musicians, artists, actors and actresses. Because this was the subject matter for a documentary, to keep its serenity and purity, no filmmakers of any kind were interviewed. The film is in the making.
Resampling is a way to reuse data to generate new, hypothetical samples (called resamples) that are representative of an underlying population. Two popular tools are the bootstrap and jackknife. Although they have many similarities (e.g. they both can estimate precision for an estimator θ), they do have a few notable differences. Bootstrapping is the most popular resampling method today. It uses sampling with replacement to estimate the sampling distribution for a desired estimator.
It was the second half of the 18th century, and there was no branch of mathematical sciences called "Probability Theory". It was known simply by the rather odd-sounding "Doctrine of Chances" -- named after a book by Abraham de Moievre. An article called, "An Essay towards solving a Problem in the Doctrine of Chances", first formulated by Bayes, but edited and amended by his friend Richard Price, was read to Royal Society and published in the Philosophical Transactions of the Royal Society of London, in 1763. In this essay, Bayes described -- in a rather frequentist manner -- the simple theorem concerning joint probability which gives rise to the calculation of inverse probability i.e.
Monte Carlo methods are a class of techniques for randomly sampling a probability distribution. There are many problem domains where describing or estimating the probability distribution is relatively straightforward, but calculating a desired quantity is intractable. This may be due to many reasons, such as the stochastic nature of the domain or an exponential number of random variables. Instead, a desired quantity can be approximated by using random sampling, referred to as Monte Carlo methods. These methods were initially used around the time that the first computers were created and remain pervasive through all fields of science and engineering, including artificial intelligence and machine learning.
In the previous post we saw what Bayes' Theorem is, and went through an easy, intuitive example of how it works. You can find this post here. If you don't know what Bayes' Theorem is, and you have not had the pleasure to read it yet, I recommend you do, as it will make understanding this present article a lot easier. In this post, we will see the uses of this theorem in Machine Learning. As mentioned in the previous post, Bayes' theorem tells use how to gradually update our knowledge on something as we get more evidence or that about that something.
When explaining probabilistic models, any human-oriented framework for interpretability should take into account how humans understand and interpret probabilities. The psychological and cognitive science communities have long studied this topic (tversky1974judgment), showing, for example, that humans are notoriously bad at incorporating class priors when thinking about probabilities. The classic example of Breast Cancer diagnosis due to eddy1982probabilistic, showed that the majority of subjects (doctors) tended to provide estimates of posterior probabilities roughly one order of magnitude higher that the true values. This phenomenon has been attributed to a neglect of base-rates during reasoning (the base-rate fallacy (bar-hillel1980base)), or instead, to a confusion of inverse conditional probabilities P(A B) and P(B A), one of which needs to be estimated and the other one is provided (the inverse fallacy, (koehler1996base)). Whatever the cause, we argue here that its effect--i.e., that humans often struggle to reason about posterior probabilities--should be taken into account.
Model selection is the problem of choosing one from among a set of candidate models. It is common to choose a model that performs the best on a hold-out test dataset or to estimate model performance using a resampling technique, such as k-fold cross-validation. An alternative approach to model selection involves using probabilistic statistical measures that attempt to quantify both the model performance on the training dataset and the complexity of the model. Examples include the Akaike and Bayesian Information Criterion and the Minimum Description Length. The benefit of these information criterion statistics is that they do not require a hold-out test set, although a limitation is that they do not take the uncertainty of the models into account and may end-up selecting models that are too simple.
Probability is a measure of uncertainty. Probability applies to machine learning because in the real world, we need to make decisions with incomplete information. Hence, we need a mechanism to quantify uncertainty – which Probability provides us. Using probability, we can model elements of uncertainty such as risk in financial transactions and many other business processes. In contrast, in traditional programming, we work with deterministic problems i.e. the solution is not affected by uncertainty.