Maximum Entropy
Maximum Entropy Kernels for System Identification
Carli, Francesca Paola, Chen, Tianshi, Ljung, Lennart
A new nonparametric approach for system identification has been recently proposed where the impulse response is modeled as the realization of a zero-mean Gaussian process whose covariance (kernel) has to be estimated from data. In this scheme, quality of the estimates crucially depends on the parametrization of the covariance of the Gaussian process. A family of kernels that have been shown to be particularly effective in the system identification framework is the family of Diagonal/Correlated (DC) kernels. Maximum entropy properties of a related family of kernels, the Tuned/Correlated (TC) kernels, have been recently pointed out in the literature. In this paper we show that maximum entropy properties indeed extend to the whole family of DC kernels. The maximum entropy interpretation can be exploited in conjunction with results on matrix completion problems in the graphical models literature to shed light on the structure of the DC kernel. In particular, we prove that the DC kernel admits a closed-form factorization, inverse and determinant. These results can be exploited both to improve the numerical stability and to reduce the computational complexity associated with the computation of the DC estimator.
Unification of field theory and maximum entropy methods for learning probability densities
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory is thus seen to provide a natural test of the validity of the maximum entropy null hypothesis. Bayesian field theory also returns a lower entropy density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided. Based on these results, I argue that Bayesian field theory is poised to provide a definitive solution to the density estimation problem in one dimension.
Maximum Entropy Semi-Supervised Inverse Reinforcement Learning
Audiffren, Julien (CMLA UMR) | Valko, Michal (INRIA) | Lazaric, Alessandro (INRIA) | Ghavamzadeh, Mohammad (Adobe Research)
A popular approach to apprenticeship learning (AL) is to formulate it as an inverse reinforcement learning (IRL) problem. The MaxEnt-IRL algorithm successfully integrates the maximum entropy principle into IRL and unlike its predecessors, it resolves the ambiguity arising from the fact that a possibly large number of policies could match the expert's behavior. In this paper, we study an AL setting in which in addition to the expert's trajectories,a number of unsupervised trajectories is available. We introduce MESSI,a novel algorithm that combines MaxEnt-IRL with principles coming from semisupervised learning. In particular, MESSI integrates the unsupervised data into the MaxEnt-IRL framework using a pairwise penalty on trajectories. Empirical results in a highway driving and grid-world problems indicate that MESSI is able to take advantage of the unsupervised trajectories and improve the performance of MaxEnt-IRL.
On the Maximum Entropy Property of the First-Order Stable Spline Kernel and its Implications
A new nonparametric approach for system identification has been recently proposed where the impulse response is seen as the realization of a zero--mean Gaussian process whose covariance, the so--called stable spline kernel, guarantees that the impulse response is almost surely stable. Maximum entropy properties of the stable spline kernel have been pointed out in the literature. In this paper we provide an independent proof that relies on the theory of matrix extension problems in the graphical model literature and leads to a closed form expression for the inverse of the first order stable spline kernel as well as to a new factorization in the form $UWU^\top$ with $U$ upper triangular and $W$ diagonal. Interestingly, all first--order stable spline kernels share the same factor $U$ and $W$ admits a closed form representation in terms of the kernel hyperparameter, making the factorization computationally inexpensive. Maximum likelihood properties of the stable spline kernel are also highlighted. These results can be applied both to improve the stability and to reduce the computational complexity associated with the computation of stable spline estimators.
A Novel Methodology for Processing Probabilistic Knowledge Bases Under Maximum Entropy
Kern-Isberner, Gabriele (TU Dortmund University) | Wilhelm, Marco (TU Dortmund University) | Beierle, Christoph (University of Hagen)
Probabilistic reasoning under the so-called principle of maximum entropy is a viable and convenient alternative to Bayesian networks, relieving the user from providing complete (local) probabilistic information and observing rigorous conditional independence assumptions. In this paper, we present a novel approach to performing computational MaxEnt reasoning that makes use of symbolic computations instead of graph-based techniques. Given a probabilistic knowledge base, we encode the MaxEnt optimization problem into a system of polynomial equations, and then apply Gröbner basis theory to find MaxEnt inferences as solutions to the polynomials. We illustrate our approach with an example of a knowledge base that represents findings on fraud detection in enterprises.
Implementation of a Transformation System for Relational Probabilistic Knowledge Bases Simplifying the Maximum Entropy Model Computation
Beierle, Christoph (University of Hagen) | Höhnerbach, Markus (University of Hagen) | Marto, Marcus (University of Hagen)
The maximum entropy (ME) model of a knowledge base R consisting of relational probabilistic conditionals can be defined referring to the set of all ground instances of the conditionals. The logic FO-PCL employs the notion of parametric uniformity for avoiding the full grounding of R. We present an implementation of a rule system transforming R into a knowledge base that is parametrically uniform and has the same ME model, simplifying the ME model computation. The implementation provides different execution and evaluation modes, including the generation of all possible solutions.
The Maximum Entropy Relaxation Path
Dubiner, Moshe, Gavish, Matan, Singer, Yoram
The relaxed maximum entropy problem is concerned with finding a probability distribution on a finite set that minimizes the relative entropy to a given prior distribution, while satisfying relaxed max-norm constraints with respect to a third observed multinomial distribution. We study the entire relaxation path for this problem in detail. We show existence and a geometric description of the relaxation path. Specifically, we show that the maximum entropy relaxation path admits a planar geometric description as an increasing, piecewise linear function in the inverse relaxation parameter. We derive fast algorithms for tracking the path. In various realistic settings, our algorithms require $O(n\log(n))$ operations for probability distributions on $n$ points, making it possible to handle large problems. Once the path has been recovered, we show that given a validation set, the family of admissible models is reduced from an infinite family to a small, discrete set. We demonstrate the merits of our approach in experiments with synthetic data and discuss its potential for the estimation of compact n-gram language models.
Multi-View Maximum Entropy Discrimination
Sun, Shiliang (East China Normal University) | Chao, Guoqing (East China Normal University)
Maximum entropy discrimination (MED) is a general framework for discriminative estimation based on the well known maximum entropy principle, which embodies the Bayesian integration of prior information with large margin constraints on observations. It is a successful combination of maximum entropy learning and maximum margin learning, and can subsume support vector machines (SVMs) as a special case. In this paper, we present a multi-view maximum entropy discrimination framework that is an extension of MED to the scenario of learning with multiple feature sets. Different from existing approaches to exploiting multiple views, such as co-training style algorithms and co-regularization style algorithms, we propose a new method to make use of the distinct views where classification margins from these views are required to be identical. We give the general form of the solution to the multi-view maximum entropy discrimination, and provide an instantiation under a specific prior formulation which is analogical to a multi-view version of SVMs. Experimental results on real-world data sets show the effectiveness of the proposed multi-view maximum entropy discrimination approach.
Some Notes on the Factorization of Probabilistic Logical Models under Maximum Entropy Semantics
Potyka, Nico (FernUniversität Hagen)
Probabilistic conditional logics offer a rich and well-founded framework for designing expert systems. The factorization of their maximum entropy models has several interesting applications. In this paper a general factorization is derived providing a more rigorous proof than in previous work. It yields an approach to extend Iterative Scaling variants to deterministic knowledge bases. Subsequently the connection to Markov Random Fields is revisited.
Uncertain Reasoning Using Maximum Entropy Inference
The use of maximum entropy inference in reasoning with uncertain information is commonly justified by an information-theoretic argument. This paper discusses a possible objection to this information-theoretic justification and shows how it can be met. I then compare maximum entropy inference with certain other currently popular methods for uncertain reasoning. In making such a comparison, one must distinguish between static and dynamic theories of degrees of belief: a static theory concerns the consistency conditions for degrees of belief at a given time; whereas a dynamic theory concerns how one's degrees of belief should change in the light of new information. It is argued that maximum entropy is a dynamic theory and that a complete theory of uncertain reasoning can be gotten by combining maximum entropy inference with probability theory, which is a static theory. This total theory, I argue, is much better grounded than are other theories of uncertain reasoning.