Goto

Collaborating Authors

 Maximum Entropy


The Maximum Entropy Relaxation Path

arXiv.org Machine Learning

The relaxed maximum entropy problem is concerned with finding a probability distribution on a finite set that minimizes the relative entropy to a given prior distribution, while satisfying relaxed max-norm constraints with respect to a third observed multinomial distribution. We study the entire relaxation path for this problem in detail. We show existence and a geometric description of the relaxation path. Specifically, we show that the maximum entropy relaxation path admits a planar geometric description as an increasing, piecewise linear function in the inverse relaxation parameter. We derive fast algorithms for tracking the path. In various realistic settings, our algorithms require $O(n\log(n))$ operations for probability distributions on $n$ points, making it possible to handle large problems. Once the path has been recovered, we show that given a validation set, the family of admissible models is reduced from an infinite family to a small, discrete set. We demonstrate the merits of our approach in experiments with synthetic data and discuss its potential for the estimation of compact n-gram language models.


Multi-View Maximum Entropy Discrimination

AAAI Conferences

Maximum entropy discrimination (MED) is a general framework for discriminative estimation based on the well known maximum entropy principle, which embodies the Bayesian integration of prior information with large margin constraints on observations. It is a successful combination of maximum entropy learning and maximum margin learning, and can subsume support vector machines (SVMs) as a special case. In this paper, we present a multi-view maximum entropy discrimination framework that is an extension of MED to the scenario of learning with multiple feature sets. Different from existing approaches to exploiting multiple views, such as co-training style algorithms and co-regularization style algorithms, we propose a new method to make use of the distinct views where classification margins from these views are required to be identical. We give the general form of the solution to the multi-view maximum entropy discrimination, and provide an instantiation under a specific prior formulation which is analogical to a multi-view version of SVMs. Experimental results on real-world data sets show the effectiveness of the proposed multi-view maximum entropy discrimination approach.


Uncertain Reasoning Using Maximum Entropy Inference

arXiv.org Artificial Intelligence

The use of maximum entropy inference in reasoning with uncertain information is commonly justified by an information-theoretic argument. This paper discusses a possible objection to this information-theoretic justification and shows how it can be met. I then compare maximum entropy inference with certain other currently popular methods for uncertain reasoning. In making such a comparison, one must distinguish between static and dynamic theories of degrees of belief: a static theory concerns the consistency conditions for degrees of belief at a given time; whereas a dynamic theory concerns how one's degrees of belief should change in the light of new information. It is argued that maximum entropy is a dynamic theory and that a complete theory of uncertain reasoning can be gotten by combining maximum entropy inference with probability theory, which is a static theory. This total theory, I argue, is much better grounded than are other theories of uncertain reasoning.


MESA: Maximum Entropy by Simulated Annealing

arXiv.org Artificial Intelligence

Probabilistic reasoning systems combine different probabilistic rules and probabilistic facts to arrive at the desired probability values of consequences. In this paper we describe the MESA-algorithm (Maximum Entropy by Simulated Annealing) that derives a joint distribution of variables or propositions. It takes into account the reliability of probability values and can resolve conflicts between contradictory statements. The joint distribution is represented in terms of marginal distributions and therefore allows to process large inference networks and to determine desired probability values with high precision. The procedure derives a maximum entropy distribution subject to the given constraints. It can be applied to inference networks of arbitrary topology and may be extended into a number of directions.


Efficient high dimensional maximum entropy modeling via symmetric partition functions

Neural Information Processing Systems

The application of the maximum entropy principle to sequence modeling has been popularized by methods such as Conditional Random Fields (CRFs). However, these approaches are generally limited to modeling paths in discrete spaces of low dimensionality. We consider the problem of modeling distributions over paths in continuous spaces of high dimensionality---a problem for which inference is generally intractable. Our main contribution is to show that maximum entropy modeling of high-dimensional, continuous paths is tractable as long as the constrained features possess a certain kind of low dimensional structure. In this case, we show that the associated {\em partition function} is symmetric and that this symmetry can be exploited to compute the partition function efficiently in a compressed form. Empirical results are given showing an application of our method to maximum entropy modeling of high dimensional human motion capture data.


Boltzmann Machine Learning with the Latent Maximum Entropy Principle

arXiv.org Machine Learning

We present a new statistical learning paradigm for Boltzmann machines based on a new inference principle we have proposed: the latent maximum entropy principle (LME). LME is different both from Jaynes maximum entropy principle and from standard maximum likelihood estimation.We demonstrate the LME principle BY deriving new algorithms for Boltzmann machine parameter estimation, and show how robust and fast new variant of the EM algorithm can be developed.Our experiments show that estimation based on LME generally yields better results than maximum likelihood estimation, particularly when inferring hidden units from small amounts of data.


Mixture-of-Parents Maximum Entropy Markov Models

arXiv.org Artificial Intelligence

We present the mixture-of-parents maximum entropy Markov model (MoP-MEMM), a class of directed graphical models extending MEMMs. The MoP-MEMM allows tractable incorporation of long-range dependencies between nodes by restricting the conditional distribution of each node to be a mixture of distributions given the parents. We show how to efficiently compute the exact marginal posterior node distributions, regardless of the range of the dependencies. This enables us to model non-sequential correlations present within text documents, as well as between interconnected documents, such as hyperlinked web pages. We apply the MoP-MEMM to a named entity recognition task and a web page classification task. In each, our model shows significant improvement over the basic MEMM, and is competitive with other long-range sequence models that use approximate inference.


A Graphical Model Formulation of Collaborative Filtering Neighbourhood Methods with Fast Maximum Entropy Training

arXiv.org Machine Learning

Item neighbourhood methods for collaborative filtering learn a weighted graph over the set of items, where each item is connected to those it is most similar to. The prediction of a user's rating on an item is then given by that rating of neighbouring items, weighted by their similarity. This paper presents a new neighbourhood approach which we call item fields, whereby an undirected graphical model is formed over the item graph. The resulting prediction rule is a simple generalization of the classical approaches, which takes into account non-local information in the graph, allowing its best results to be obtained when using drastically fewer edges than other neighbourhood approaches. A fast approximate maximum entropy training method based on the Bethe approximation is presented, which uses a simple gradient ascent procedure. When using precomputed sufficient statistics on the Movielens datasets, our method is faster than maximum likelihood approaches by two orders of magnitude.


Asymptotic Maximum Entropy Principle for Utility Elicitation under High Uncertainty and Partial Information

AAAI Conferences

Decision making has proposed multiple methods to help the decision maker in his analysis, by suggesting ways of formalization of the preferences as well as the assessment of the uncertainties. Although these techniques are established and proven to be mathematically sound, experience has shown that in certain situations we tend to avoid the formal approach by acting intuitively. Especially, when the decision involves a large number of attributes and outcomes, and where we need to use pragmatic and heuristic simplifications such as considering only the most important attributes and omitting the others. In this paper, we provide a model for decision making in situations subject to a large predictive uncertainty with a small learning sample. The high predictive uncertainty is concretized by a countably infinite number of prospects, making the preferences assessment more difficult. Our main result is an extension of the Maximum Entropy utility (MEU) principle into an asymptotic maximum entropy utility principle for preferences elicitation. This will allow us to overcome the limits of the existing MEU method to the extend that we focus on utility assessment when the set of the available discrete prospects is countably infinite. Furthermore, our proposed model can be used to analyze situations of high-cognitive load as well as to understand how humans handle these problems under Ceteris Paribus assumption.


How biased are maximum entropy models?

Neural Information Processing Systems

Maximum entropy models have become popular statistical models in neuroscience and other areas in biology, and can be useful tools for obtaining estimates of mu- tual information in biological systems. However, maximum entropy models fit to small data sets can be subject to sampling bias; i.e. the true entropy of the data can be severely underestimated. Here we study the sampling properties of estimates of the entropy obtained from maximum entropy models. We show that if the data is generated by a distribution that lies in the model class, the bias is equal to the number of parameters divided by twice the number of observations. However, in practice, the true distribution is usually outside the model class, and we show here that this misspecification can lead to much larger bias. We provide a perturba- tive approximation of the maximally expected bias when the true model is out of model class, and we illustrate our results using numerical simulations of an Ising model; i.e. the second-order maximum entropy distribution on binary data.