AITopics | Maximum Entropy

Collaborating Authors

Maximum Entropy

News Overviews Instructional Materials AI-Alerts Classics

Neural computation from first principles: Using the maximum entropy method to obtain an optimal bits-per-joule neuron

Levy, William B, Berger, Toby, Sungkar, Mustafa

arXiv.org Machine LearningDec-19-2017

Optimization results are one method for understanding neural computation from Nature's perspective and for defining the physical limits on neuron-like engineering. Earlier work looks at individual properties or performance criteria and occasionally a combination of two, such as energy and information. Here we make use of Jaynes' maximum entropy method and combine a larger set of constraints, possibly dimensionally distinct, each expressible as an expectation. The method identifies a likelihood-function and a sufficient statistic arising from each such optimization. This likelihood is a first-hitting time distribution in the exponential class. Particular constraint sets are identified that, from an optimal inference perspective, justify earlier neurocomputational models. Interactions between constraints, mediated through the inferred likelihood, restrict constraint-set parameterizations, e.g., the energy-budget limits estimation performance which, in turn, matches an axonal communication constraint. Such linkages are, for biologists, experimental predictions of the method. In addition to the related likelihood, at least one type of constraint set implies marginal distributions, and in this case, a Shannon bits/joule statement arises.

bayesian inference, constraint, neurology, (20 more...)

arXiv.org Machine Learning

1606.03063

Country: North America > United States > Virginia > Albemarle County > Charlottesville (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Latent Laplacian Maximum Entropy Discrimination for Detection of High-Utility Anomalies

Hou, Elizabeth, Sricharan, Kumar, Hero, Alfred O.

arXiv.org Machine LearningDec-13-2017

Data-driven anomaly detection methods suffer from the drawback of detecting all instances that are statistically rare, irrespective of whether the detected instances have real-world significance or not. In this paper, we are interested in the problem of specifically detecting anomalous instances that are known to have high real-world utility, while ignoring the low-utility statistically anomalous instances. To this end, we propose a novel method called Latent Laplacian Maximum Entropy Discrimination (LatLapMED) as a potential solution. This method uses the EM algorithm to simultaneously incorporate the Geometric Entropy Minimization principle for identifying statistical anomalies, and the Maximum Entropy Discrimination principle to incorporate utility labels, in order to detect high-utility anomalies. We apply our method in both simulated and real datasets to demonstrate that it has superior performance over existing alternatives that independently pre-process with unsupervised anomaly detection algorithms before classifying.

artificial intelligence, data mining, latlapmed, (19 more...)

arXiv.org Machine Learning

1702.05148

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.81)

Add feedback

Computing Maximum Entropy Distributions Everywhere

Straszak, Damian, Vishnoi, Nisheeth K.

arXiv.org Machine LearningNov-6-2017

We study the problem of computing the maximum entropy distribution with a specified expectation over a large discrete domain. Maximum entropy distributions arise and have found numerous applications in economics, machine learning and various sub-disciplines of mathematics and computer science. The key computational questions related to maximum entropy distributions are whether they have succinct descriptions and whether they can be efficiently computed. Here we provide positive answers to both of these questions for very general domains and, importantly, with no restriction on the expectation. This completes the picture left open by the prior work on this problem which requires that the expectation vector is polynomially far in the interior of the convex hull of the domain. As a consequence we obtain a general algorithmic tool and show how it can be applied to derive several old and new results in a unified manner. In particular, our results imply that certain recent continuous optimization formulations, for instance, for discrete counting and optimization problems, the matrix scaling problem, and the worst case Brascamp-Lieb constants in the rank-1 regime, are efficiently computable. Attaining these implications requires reformulating the underlying problem as a version of maximum entropy computation where optimization also involves the expectation vector and, hence, cannot be assumed to be sufficiently deep in the interior. The key new technical ingredient in our work is a polynomial bound on the bit complexity of near-optimal dual solutions to the maximum entropy convex program. This result is obtained by a geometrical reasoning that involves convex analysis and polyhedral geometry, avoiding combinatorial arguments based on the specific structure of the domain. We also provide a lower bound on the bit complexity of near-optimal solutions showing the tightness of our results.

artificial intelligence, optimization problem, polynomial, (18 more...)

arXiv.org Machine Learning

1711.02036

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)

Add feedback

Quantifying multivariate redundancy with maximum entropy decompositions of mutual information

Chicharro, Daniel

arXiv.org Machine LearningAug-12-2017

Williams and Beer (2010) proposed a nonnegative mutual information decomposition, based on the construction of redundancy lattices, which allows separating the information that a set of variables contains about a target variable into nonnegative components interpretable as the unique information of some variables not contained in others as well as redundant and synergistic components. However, the definition of multivariate measures of redundancy that comply with nonnegativity and conform to certain axioms that capture conceptually desirable properties of redundancy has proven to be elusive. We here present a procedure to determine multivariate redundancy measures, within the framework of maximum entropy models. In particular, we generalize existing bivariate maximum entropy-based measures of redundancy and unique information, defining measures of the redundant information that a group of variables has about a target, and of the unique redundant information that a group of variables has about a target that is not redundant with information from another group. The two key ingredients for this approach are: First, the identification of a type of constraints on entropy maximization that allows isolating components of redundancy and unique redundancy by mirroring them to synergy components. Second, the construction of rooted tree-based decompositions to breakdown mutual information, ensuring nonnegativity by the local implementation of maximum entropy information projections at each binary unfolding of the tree nodes. Altogether, the proposed measures are nonnegative and conform to the desirable axioms for redundancy measures.

health & medicine, information, neurology, (20 more...)

arXiv.org Machine Learning

1708.03845

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)

Add feedback

Regression, Logistic Regression and Maximum Entropy

@machinelearnbotAug-6-2017, 05:20:08 GMT

One of the most important tasks in Machine Learning are the Classification tasks (a.k.a. Classification is used to make an accurate prediction of the class of entries in the test set (a dataset of which the entries have not been labelled yet) with the model which was constructed from a training set. You could think of classifying crime in the field of Pre-Policing, classifying patients in the Health sector, classifying houses in the Real-Estate sector. Another field in which classification is big, is Natural Lanuage Processing (NLP). This is the field of science with the goal to makes machines (computers) understand (written) human language.

artificial intelligence, logistic regression, machine learning, (15 more...)

@machinelearnbot

Industry: Banking & Finance > Real Estate (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.44)

Add feedback

Maximum Entropy Flow Networks

Loaiza-Ganem, Gabriel, Gao, Yuanjun, Cunningham, John P.

arXiv.org Machine LearningApr-28-2017

Maximum entropy modeling is a flexible and popular framework for formulating statistical models given partial knowledge. In this paper, rather than the traditional method of optimizing over the continuous density directly, we learn a smooth and invertible transformation that maps a simple distribution to the desired maximum entropy distribution. Doing so is nontrivial in that the objective being maximized (entropy) is a function of the density itself. By exploiting recent developments in normalizing flow networks, we cast the maximum entropy problem into a finite-dimensional constrained optimization, and solve the problem by combining stochastic optimization with the augmented Lagrangian method. Simulation results demonstrate the effectiveness of our method, and applications to finance and computer vision show the flexibility and accuracy of using maximum entropy flow networks.

artificial intelligence, banking & finance, mefn, (16 more...)

arXiv.org Machine Learning

1701.03504

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance > Trading (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)

Add feedback

Dual-Clustering Maximum Entropy with Application to Classification and Word Embedding

Wang, Xiaolong (University of Illinois ) | Wang, Jingjing (University of Illinois) | Zhai, Chengxiang (University of Illinois)

AAAI ConferencesFeb-14-2017

Maximum Entropy (ME), as a general-purpose machine learning model, has been successfully applied to various fields such as text mining and natural language processing. It has been used as a classification technique and recently also applied to learn word embedding. ME establishes a distribution of the exponential form over items (classes/words). When training such a model, learning efficiency is guaranteed by globally updating the entire set of model parameters associated with all items at each training instance. This creates a significant computational challenge when the number of items is large. To achieve learning efficiency with affordable computational cost, we propose an approach named Dual-Clustering Maximum Entropy (DCME). Exploiting the primal-dual form of ME, it conducts clustering in the dual space and approximates each dual distribution by the corresponding cluster center. This naturally enables a hybrid online-offline optimization algorithm whose time complexity per instance only scales as the product of the feature/word vector dimensionality and the cluster number. Experimental studies on text classification and word embedding learning demonstrate that DCME effectively strikes a balance between training speed and model quality, substantially outperforming state-of-the-art methods.

algorithm, artificial intelligence, natural language, (19 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Illinois > Champaign County > Urbana (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)

Add feedback

Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods

Risteski, Andrej, Li, Yuanzhi

Neural Information Processing SystemsDec-31-2016

The well known maximum-entropy principle due to Jaynes, which states that given mean parameters, the maximum entropy distribution matching them is in an exponential family has been very popular in machine learning due to its “Occam’s razor” interpretation. Unfortunately, calculating the potentials in the maximum entropy distribution is intractable [BGS14]. We provide computationally efficient versions of this principle when the mean parameters are pairwise moments: we design distributions that approximately match given pairwise moments, while having entropy which is comparable to the maximum entropy distribution matching those moments. We additionally provide surprising applications of the approximate maximum entropy principle to designing provable variational methods for partition function calculations for Ising models without any assumptions on the potentials of the model. More precisely, we show that we can get approximation guarantees for the log-partition function comparable to those in the low-temperature limit, which is the setting of optimization of quadratic forms over the hypercube. ([AN06])

artificial intelligence, machine learning, variational method, (18 more...)

Neural Information Processing Systems

Country:

Europe > Spain (0.14)
Asia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)

Add feedback

Maximum entropy models capture melodic styles

Sakellariou, Jason, Tria, Francesca, Loreto, Vittorio, Pachet, François

arXiv.org Machine LearningOct-11-2016

We introduce a Maximum Entropy model able to capture the statistics of melodies in music. The model can be used to generate new melodies that emulate the style of the musical corpus which was used to train it. Instead of using the $n-$body interactions of $(n-1)-$order Markov models, traditionally used in automatic music generation, we use a $k-$nearest neighbour model with pairwise interactions only. In that way, we keep the number of parameters low and avoid over-fitting problems typical of Markov models. We show that long-range musical phrases don't need to be explicitly enforced using high-order Markov interactions, but can instead emerge from multiple, competing, pairwise interactions. We validate our Maximum Entropy model by contrasting how much the generated sequences capture the style of the original corpus without plagiarizing it. To this end we use a data-compression approach to discriminate the levels of borrowing and innovation featured by the artificial sequences. The results show that our modelling scheme outperforms both fixed-order and variable-order Markov models. This shows that, despite being based only on pairwise interactions, this Maximum Entropy scheme opens the possibility to generate musically sensible alterations of the original phrases, providing a way to generate innovation.

artificial intelligence, machine learning, sequence, (18 more...)

arXiv.org Machine Learning

1610.03414

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.76)

Add feedback

Maximum Entropy Vector Kernels for MIMO system identification

Prando, Giulia, Pillonetto, Gianluigi, Chiuso, Alessandro

arXiv.org Machine LearningSep-29-2016

Recent contributions have framed linear system identification as a nonparametric regularized inverse problem. Relying on $\ell_2$-type regularization which accounts for the stability and smoothness of the impulse response to be estimated, these approaches have been shown to be competitive w.r.t classical parametric methods. In this paper, adopting Maximum Entropy arguments, we derive a new $\ell_2$ penalty deriving from a vector-valued kernel; to do so we exploit the structure of the Hankel matrix, thus controlling at the same time complexity, measured by the McMillan degree, stability and smoothness of the identified models. As a special case we recover the nuclear norm penalty on the squared block Hankel matrix. In contrast with previous literature on reweighted nuclear norm penalties, our kernel is described by a small number of hyper-parameters, which are iteratively updated through marginal likelihood maximization; constraining the structure of the kernel acts as a (hyper)regularizer which helps controlling the effective degrees of freedom of our estimator. To optimize the marginal likelihood we adapt a Scaled Gradient Projection (SGP) algorithm which is proved to be significantly computationally cheaper than other first and second order off-the-shelf optimization methods. The paper also contains an extensive comparison with many state-of-the-art methods on several Monte-Carlo studies, which confirms the effectiveness of our procedure.

algorithm, bayesian inference, survey article, (21 more...)

arXiv.org Machine Learning

1508.02865

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)

Add feedback