Goto

Collaborating Authors

 discrete probability distribution



Reasoning Under Uncertainty: Exploring Probabilistic Reasoning Capabilities of LLMs

arXiv.org Artificial Intelligence

Despite widespread success in language understanding and generation, large language models (LLMs) exhibit unclear and often inconsistent behavior when faced with tasks that require probabilistic reasoning. In this work, we present the first comprehensive study of the reasoning capabilities of LLMs over explicit discrete probability distributions. Given observations from a probability distribution, we evaluate models on three carefully designed tasks, mode identification, maximum likelihood estimation, and sample generation, by prompting them to provide responses to queries about either the joint distribution or its conditionals. These tasks thus probe a range of probabilistic skills, including frequency analysis, marginalization, and generative behavior. Through comprehensive empirical evaluations, we demonstrate that there exists a clear performance gap between smaller and larger models, with the latter demonstrating stronger inference and surprising capabilities in sample generation. Furthermore, our investigations reveal notable limitations, including sensitivity to variations in the notation utilized to represent probabilistic outcomes and performance degradation of over 60% as context length increases. Together, our results provide a detailed understanding of the probabilistic reasoning abilities of LLMs and identify key directions for future improvement.


Supplementary Material T able of Contents

Neural Information Processing Systems

Returning to the variational problem in Equation (A.5), we can now write D (by Lemma 2) Assume |A| < and that the MDP is ergodic. Parts of this proof are adapted from the proof given in Haarnoja et al. Convergence follows from Outcome-Driven Policy Evaluation above. We will use analogous notation for p . The result follows from Lemma 4, Equation (A.128), Equation (A.129), and the definition of f .


Explaining a probabilistic prediction on the simplex with Shapley compositions

arXiv.org Artificial Intelligence

Originating in game theory, Shapley values are widely used for explaining a machine learning model's prediction by quantifying the contribution of each feature's value to the prediction. This requires a scalar prediction as in binary classification, whereas a multiclass probabilistic prediction is a discrete probability distribution, living on a multidimensional simplex. In such a multiclass setting the Shapley values are typically computed separately on each class in a one-vs-rest manner, ignoring the compositional nature of the output distribution. In this paper, we introduce Shapley compositions as a well-founded way to properly explain a multiclass probabilistic prediction, using the Aitchison geometry from compositional data analysis. We prove that the Shapley composition is the unique quantity satisfying linearity, symmetry and efficiency on the Aitchison simplex, extending the corresponding axiomatic properties of the standard Shapley value. We demonstrate this proper multiclass treatment in a range of scenarios.


Probability Distributions To Be Aware Of For Data Science (With Code)

#artificialintelligence

Probability and statistics knowledge is at the core of data science and machine learning; You'll require both statistics and probability knowledge to effectively gather, review, analyze and communicate with data. This means it's essential for you to have a good grasp of some fundamental terminologies, what they mean, and how to identify them. One such term you'll hear thrown around a lot is'distribution.' All this is in reference to is the properties of the data. There's several instances of phenomena in the real world that are considered to be statistical in nature (i.e. This means there are several instances in which we've been able to develop methodologies that help us model nature through mathematical functions that can describe the characteristics of the data.


Binomial Distribution Explained with Examples - Data Analytics

#artificialintelligence

The binomial distribution is a probability distribution that applies to binomial experiments. The binomial distribution may be imagined as the probability distribution of a number of heads that appear on a coin flip in a specific experiment comprising of a fixed number of coin flips. In this blog post, we will learn binomial distribution with the help of examples. If you are an aspiring data scientist looking forward to learning/understand the binomial distribution in a better manner, this post might be very helpful. The binomial distribution is a discrete probability distribution that represents the probabilities of binomial random variables in a binomial experiment.


The Bernoulli and Binomial Distributions

#artificialintelligence

The probability for a discrete random variable can be summarized with a discrete probability distribution. Discrete probability distributions are used in machine learning, most notably in the modeling of binary and multi-class classification problems, but also in evaluating the performance for binary classification models, such as the calculation of confidence intervals, and in the modeling of the distribution of words in text for natural language processing. Knowledge of discrete probability distributions is also required in the choice of activation functions in the output layer of deep learning neural networks for classification tasks and selecting an appropriate loss function. Discrete probability distributions play an important role in applied machine learning and there are a few distributions that a practitioner must know about. In this tutorial, you will discover discrete probability distributions (Bernoulli and Binomial Distribution) used in machine learning.


Famous Probability Distributions in Data Science

#artificialintelligence

Data Scientists are modern-day statisticians that take a shot on complex business problems and unravel them with the assistance of data. Probability Distributions allow a Data Scientist or Data Analyst to recognize patterns in any case totally random variables. A normal distribution is generally described as the bell-shaped curve and it depicts the recurrence of something that you are evaluating, such as the class scores. The focal point of the bend is the mean and the curve width called the standard deviation. The score happens most every now and again is the mean.


The Kullbackโ€“Leibler divergence between discrete probability distributions

#artificialintelligence

If you have been learning about machine learning or mathematical statistics, you might have heard about the Kullbackโ€“Leibler divergence. The Kullbackโ€“Leibler divergence is a measure of dissimilarity between two probability distributions. It measures how much one distribution differs from a reference distribution. This article explains the Kullbackโ€“Leibler divergence and shows how to compute it for discrete probability distributions. Recall that there are many statistical methods that indicate how much two distributions differ.


Discrete Probability Distributions for Machine Learning

#artificialintelligence

The probability for a discrete random variable can be summarized with a discrete probability distribution. Discrete probability distributions are used in machine learning, most notably in the modeling of binary and multi-class classification problems, but also in evaluating the performance for binary classification models, such as the calculation of confidence intervals, and in the modeling of the distribution of words in text for natural language processing. Knowledge of discrete probability distributions is also required in the choice of activation functions in the output layer of deep learning neural networks for classification tasks and selecting an appropriate loss function. Discrete probability distributions play an important role in applied machine learning and there are a few distributions that a practitioner must know about. In this tutorial, you will discover discrete probability distributions used in machine learning.