Goto

Collaborating Authors

 jeffrey



Jeffrey's update rule as a minimizer of Kullback-Leibler divergence

Pinzón, Carlos, Palamidessi, Catuscia

arXiv.org Machine Learning

In this paper, we show a more concise and high level proof than the original one, derived by researcher Bart Jacobs, for the following theorem: in the context of Bayesian update rules for learning or updating internal states that produce predictions, the relative entropy between the observations and the predictions is reduced when applying Jeffrey's update rule to update the internal state.


Reviews: Bayesian Compression for Deep Learning

Neural Information Processing Systems

This paper approaches model compression using a group sparsity prior, to allow entire columns rather than just individual weights to be dropped out. They also use the variance of the posterior distribution over weights to automatically set the precision for fixed point weight quantization. The underlying ideas seem good, and the experimental results seem promising. However, the paper supports the core idea with a great deal of mathematical complexity. The math was presented in a way that I often found confusing, and in several places seems either wrong or poorly motivated (e.g., KL divergences are negative, right and left side of equations are not equal, primary motivation for model compression given in terms of minimum description length).


Benferhat

AAAI Conferences

Graphical belief models are compact and powerful tools for representing and reasoning under uncertainty. Possibilistic networks are graphical belief models based on possibility theory. In this paper, we address reasoning under uncertain inputs in both quantitative and qualitative possibilistic networks. More precisely, we first provide possibilistic counterparts of Pearl's methods of virtual evidence then compare them with the possibilistic counterparts of Jeffrey's rule of conditioning. As in the probabilistic setting, the two methods are shown to be equivalent in the quantitative setting regarding the existence and uniqueness of the solution. However in the qualitative setting, Pearl's method of virtual evidence which applies directly on graphical models disagrees with Jeffrey's rule and the virtual evidence method. The paper provides the precise situations where the methods are not equivalent. Finally, the paper addresses related issues like transformations from one method to another and commutativity.


Bayesian inference for bivariate ranks

Guillotte, Simon, Perron, François, Segers, Johan

arXiv.org Machine Learning

A recommender system based on ranks is proposed, where an expert's ranking of a set of objects and a user's ranking of a subset of those objects are combined to make a prediction of the user's ranking of all objects. The rankings are assumed to be induced by latent continuous variables corresponding to the grades assigned by the expert and the user to the objects. The dependence between the expert and user grades is modelled by a copula in some parametric family. Given a prior distribution on the copula parameter, the user's complete ranking is predicted by the mode of the posterior predictive distribution of the user's complete ranking conditional on the expert's complete and the user's incomplete rankings. Various Markov chain Monte-Carlo algorithms are proposed to approximate the predictive distribution or only its mode. The predictive distribution can be obtained exactly for the Farlie-Gumbel-Morgenstern copula family, providing a benchmark for the approximation accuracy of the algorithms. The method is applied to the MovieLens 100k dataset with a Gaussian copula modelling dependence between the expert's and user's grades.


On Bayesian Exponentially Embedded Family for Model Order Selection

Zhu, Zhenghan, Kay, Steven

arXiv.org Machine Learning

In this paper, we derive a Bayesian model order selection rule by using the exponentially embedded family method, termed Bayesian EEF. Unlike many other Bayesian model selection methods, the Bayesian EEF can use vague proper priors and improper noninformative priors to be objective in the elicitation of parameter priors. Moreover, the penalty term of the rule is shown to be the sum of half of the parameter dimension and the estimated mutual information between parameter and observed data. This helps to reveal the EEF mechanism in selecting model orders and may provide new insights into the open problems of choosing an optimal penalty term for model order selection and choosing a good prior from information theoretic viewpoints. The important example of linear model order selection is given to illustrate the algorithms and arguments. Lastly, the Bayesian EEF that uses Jeffreys prior coincides with the EEF rule derived by frequentist strategies. This shows another interesting relationship between the frequentist and Bayesian philosophies for model selection.


Directed Cycles in Belief Networks

Wen, Wilson X.

arXiv.org Artificial Intelligence

The most difficult task in probabilistic reasoning may be handling directed cycles in belief networks. To the best knowledge of this author, there is no serious discussion of this problem at all in the literature of probabilistic reasoning so far.


Experimentally Comparing Uncertain Inference Systems to Probability

Wise, Ben P.

arXiv.org Artificial Intelligence

This paper examines the biases and performance of several uncertain inference systems: Mycin, a variant of Mycin. and a simplified version of probability using conditional independence assumptions. We present axiomatic arguments for using Minimum Cross Entropy inference as the best way to do uncertain inference. For Mycin and its variant we found special situations where its performance was very good, but also situations where performance was worse than random guessing, or where data was interpreted as having the opposite of its true import We have found that all three of these systems usually gave accurate results, and that the conditional independence assumptions gave the most robust results. We illustrate how the Importance of biases may be quantitatively assessed and ranked. Considerations of robustness might be a critical factor is selecting UlS's for a given application.


Leading strategies in competitive on-line prediction

Vovk, Vladimir

arXiv.org Artificial Intelligence

Suppose F is a normed function class of prediction strategies (the "benchmar k class"). It is well known that, under some restrictions on F, there exists a "master prediction strategy" (sometimes also called a "universal s trategy") that performs almost as well as the best strategies in F whose norm is not too large (see, e.g., [9, 5]). The "leading prediction strategies" constructed in this paper satisfy a stronger property: the loss of any prediction strategy in F whose norm is not too large exceeds the loss of a leading strategy by the diverge nce between the predictions output by the two prediction strategies. Therefo re, the leading strategy implicitly serves as a standard for prediction strategies F in F whose norm is not too large: such a prediction strategy F suffers a small loss to the degree that its predictions resemble the leading strategy's predict ions, and the only way to compete with the leading strategy is to imitate it. We start the formal exposition with a simple asymptotic result (Prop osition 1 in 2) asserting the existence of leading strategies in the problem of on -line 1 regression with the quadratic loss function for the class of continu ous limited-memory prediction strategies.


Algebraic Information Geometry for Learning Machines with Singularities

Watanabe, Sumio

Neural Information Processing Systems

Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matricesare singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.