You will hear the term probability distribution many times when working with data and machine learning models. These are extremely helpful in certain cases such as naive Bayes' where the model needs to know a lot about the probabilities of its data! What it will be referring to is either the probability density function or the probability mass function of our data, lets have a look at the important differences! In machine learning, we often provide models with distributions of probabilities to tell us about what values any new data samples are likely to be. If we are working with continuous random variables, then we would use a probability density function to model the probability of any variable being near a certain value (continuous data does not have exact probabilities, as we will see below).

Probability Distribution is an important topic that each data scientist should know for the analysis of the data. It defines all the related possibility outcomes of a variable. In this, the article you will understand all the Probability Distribution types that help you to determine the distribution for the dataset. There are two types of distribution. In the discrete Distribution, the sum of the probabilities of all the individuals is equal to one.

A few months ago, I built a recommender system that employed topic modelling to display relevant tasks to employees. The algorithm used was Latent Dirichlet Allocation (LDA), a generative model that has been around since the early 2000s¹. Of course, I didn't rewrite LDA from scratch but used the implementation in Python's scikit-learn. But it started me thinking about the sequence of research that lead to the creation of the LDA model. The problem with such libraries is that it's all too easy to include a few lines in your code and just move on, so I dug out my old machine learning books with the goal of knowing enough to be able to explain LDA in all its gory probabilistic detail.

This paper offers a detailed analysis of the structure of this family of possibility distributions by exploiting two different orderings between them: Yager's specificity ordering and a new refinement ordering. It is shown that from a representation point of view, it is sufficient to consider the subset of linear possibility distributions which corresponds to all the possible completions of the default knowledge in agreement with the constraints. There also exists a semantics for system P in terms of infinitesimal probabilities.

Zagorecki, Adam (Cranfield University and Defence Academy of the United Kingdom) | Kozniewski, Marcin (University of Pittsburgh) | Druzdzel, Marek (University of Pittsburgh)

Probabilistic graphical models, such as Bayesian networks, are intuitive and theoretically sound tools for modeling uncertainty. A major problem with applying Bayesian networks in practice is that it is hard to judge whether a model fits well a case that it is supposed to solve. One way of expressing a possible dissonance between a model and a case is the {\em surprise index}, proposed by Habbema, which expresses the degree of surprise by the evidence given the model. While this measure reflects the intuition that the probability of a case should be judged in the context of a model, it is computationally intractable. In this paper, we propose an efficient way of approximating the surprise index.