### Inference in Graded Bayesian Networks

Machine learning provides algorithms that can learn from data and make inferences or predictions on data. Bayesian networks are a class of graphical models that allow to represent a collection of random variables and their condititional dependencies by directed acyclic graphs. In this paper, an inference algorithm for the hidden random variables of a Bayesian network is given by using the tropicalization of the marginal distribution of the observed variables. By restricting the topological structure to graded networks, an inference algorithm for graded Bayesian networks will be established that evaluates the hidden random variables rank by rank and in this way yields the most probable states of the hidden variables. This algorithm can be viewed as a generalized version of the Viterbi algorithm for graded Bayesian networks.

### How Bayesian Inference Works

Bayesian inference is a way to get sharper predictions from your data. It's particularly useful when you don't have as much data as you would like and want to juice every last bit of predictive strength from it. Although it is sometimes described with reverence, Bayesian inference isn't magic or mystical. And even though the math under the hood can get dense, the concepts behind it are completely accessible. In brief, Bayesian inference lets you draw stronger conclusions from your data by folding in what you already know about the answer. Bayesian inference is based on the ideas of Thomas Bayes, a nonconformist Presbyterian minister in London about 300 years ago. He wrote two books, one on theology, and one on probability. His work included his now famous Bayes Theorem in raw form, which has since been applied to the problem of inference, the technical term for educated guessing. The popularity of Bayes' ideas was aided immeasurably by another minister, Richard Price.

### Computing Probability Intervals Under Independency Constraints

Many AI researchers argue that probability theory is only capable of dealing with uncertainty in situations where a full specification of a joint probability distribution is available, and conclude that it is not suitable for application in knowledge-based systems. Probability intervals, however, constitute a means for expressing incompleteness of information. We present a method for computing such probability intervals for probabilities of interest from a partial specification of a joint probability distribution. Our method improves on earlier approaches by allowing for independency relationships between statistical variables to be exploited.

### An Approximation of Surprise Index as a Measure of Confidence

Probabilistic graphical models, such as Bayesian networks, are intuitive and theoretically sound tools for modeling uncertainty. A major problem with applying Bayesian networks in practice is that it is hard to judge whether a model fits well a case that it is supposed to solve. One way of expressing a possible dissonance between a model and a case is the {\em surprise index}, proposed by Habbema, which expresses the degree of surprise by the evidence given the model. While this measure reflects the intuition that the probability of a case should be judged in the context of a model, it is computationally intractable. In this paper, we propose an efficient way of approximating the surprise index.

### MESA: Maximum Entropy by Simulated Annealing

Probabilistic reasoning systems combine different probabilistic rules and probabilistic facts to arrive at the desired probability values of consequences. In this paper we describe the MESA-algorithm (Maximum Entropy by Simulated Annealing) that derives a joint distribution of variables or propositions. It takes into account the reliability of probability values and can resolve conflicts between contradictory statements. The joint distribution is represented in terms of marginal distributions and therefore allows to process large inference networks and to determine desired probability values with high precision. The procedure derives a maximum entropy distribution subject to the given constraints. It can be applied to inference networks of arbitrary topology and may be extended into a number of directions.