jeffrey
Jeffrey's update rule as a minimizer of Kullback-Leibler divergence
Pinzón, Carlos, Palamidessi, Catuscia
In this paper, we show a more concise and high level proof than the original one, derived by researcher Bart Jacobs, for the following theorem: in the context of Bayesian update rules for learning or updating internal states that produce predictions, the relative entropy between the observations and the predictions is reduced when applying Jeffrey's update rule to update the internal state.
Reviews: Bayesian Compression for Deep Learning
This paper approaches model compression using a group sparsity prior, to allow entire columns rather than just individual weights to be dropped out. They also use the variance of the posterior distribution over weights to automatically set the precision for fixed point weight quantization. The underlying ideas seem good, and the experimental results seem promising. However, the paper supports the core idea with a great deal of mathematical complexity. The math was presented in a way that I often found confusing, and in several places seems either wrong or poorly motivated (e.g., KL divergences are negative, right and left side of equations are not equal, primary motivation for model compression given in terms of minimum description length).
Benferhat
Graphical belief models are compact and powerful tools for representing and reasoning under uncertainty. Possibilistic networks are graphical belief models based on possibility theory. In this paper, we address reasoning under uncertain inputs in both quantitative and qualitative possibilistic networks. More precisely, we first provide possibilistic counterparts of Pearl's methods of virtual evidence then compare them with the possibilistic counterparts of Jeffrey's rule of conditioning. As in the probabilistic setting, the two methods are shown to be equivalent in the quantitative setting regarding the existence and uniqueness of the solution. However in the qualitative setting, Pearl's method of virtual evidence which applies directly on graphical models disagrees with Jeffrey's rule and the virtual evidence method. The paper provides the precise situations where the methods are not equivalent. Finally, the paper addresses related issues like transformations from one method to another and commutativity.
Algebraic Information Geometry for Learning Machines with Singularities
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.
Algebraic Information Geometry for Learning Machines with Singularities
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.
Algebraic Information Geometry for Learning Machines with Singularities
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matricesare singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.