Integrating Probabilistic Rules into Neural Networks: A Stochastic EM Learning Algorithm

arXiv.org Artificial Intelligence

The EM-algorithm is a general procedure to get maximum likelihood estimates if part of the observations on the variables of a network are missing. In this paper a stochastic version of the algorithm is adapted to probabilistic neural networks describing the associative dependency of variables. These networks have a probability distribution, which is a special case of the distribution generated by probabilistic inference networks. Hence both types of networks can be combined allowing to integrate probabilistic rules as well as unspecified associations in a sound way. The resulting network may have a number of interesting features including cycles of probabilistic rules, hidden 'unobservable' variables, and uncertain and contradictory evidence.


Second Order Probabilities for Uncertain and Conflicting Evidence

arXiv.org Artificial Intelligence

In this paper the elicitation of probabilities from human experts is considered as a measurement process, which may be disturbed by random 'measurement noise'. Using Bayesian concepts a second order probability distribution is derived reflecting the uncertainty of the input probabilities. The algorithm is based on an approximate sample representation of the basic probabilities. This sample is continuously modified by a stochastic simulation procedure, the Metropolis algorithm, such that the sequence of successive samples corresponds to the desired posterior distribution. The procedure is able to combine inconsistent probabilities according to their reliability and is applicable to general inference networks with arbitrary structure. Dempster-Shafer probability mass functions may be included using specific measurement distributions. The properties of the approach are demonstrated by numerical experiments.


Kutato: An Entropy-Driven System for Construction of Probabilistic Expert Systems from Databases

arXiv.org Artificial Intelligence

Kutato is a system that takes as input a database of cases and produces a belief network that captures many of the dependence relations represented by those data. This system incorporates a module for determining the entropy of a belief network and a module for constructing belief networks based on entropy calculations. Kutato constructs an initial belief network in which all variables in the database are assumed to be marginally independent. The entropy of this belief network is calculated, and that arc is added that minimizes the entropy of the resulting belief network. Conditional probabilities for an arc are obtained directly from the database. This process continues until an entropy-based threshold is reached. We have tested the system by generating databases from networks using the probabilistic logic-sampling method, and then using those databases as input to Kutato. The system consistently reproduces the original belief networks with high fidelity.


Boltzmann Machine Learning with the Latent Maximum Entropy Principle

arXiv.org Machine Learning

We present a new statistical learning paradigm for Boltzmann machines based on a new inference principle we have proposed: the latent maximum entropy principle (LME). LME is different both from Jaynes maximum entropy principle and from standard maximum likelihood estimation.We demonstrate the LME principle BY deriving new algorithms for Boltzmann machine parameter estimation, and show how robust and fast new variant of the EM algorithm can be developed.Our experiments show that estimation based on LME generally yields better results than maximum likelihood estimation, particularly when inferring hidden units from small amounts of data.


Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

AAAI Conferences

We describe algorithms for learning Bayesian networks from a combination of user knowledge and statistical data. The algorithms have two components: a scoring metric and a search procedure. The scoring metric takes a network structure, statistical data, and a user's prior knowledge, and returns a score proportional to the posterior probability of the network structure given the data. The search procedure generates networks for evaluation by the scoring metric.