AITopics | entropy estimate

Collaborating Authors

entropy estimate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Know Your Limits: Entropy Estimation Modeling for Compression and Generalization

Badger, Benjamin L., Neligeorge, Matthew

arXiv.org Artificial IntelligenceNov-14-2025

Language prediction is constrained by informational entropy intrinsic to language, such that there exists a limit to how accurate any language model can become and equivalently a lower bound to language compression. The most efficient language compression algorithms today are causal (next token prediction) large language models, but the use of these models to form accurate estimates of language entropy is currently computationally infeasible. We introduce encoder-augmented causal decoder model architectures that exhibit superior training efficiency characteristics and achieve higher compression than causal transformers even when trained on modest hardware. We demonstrate how entropy estimates can be obtained on a per-token basis, and show that the generalization of models trained to approach the entropy of their training data necessarily exceeds the generalization of models trained to minimize loss beyond this value. We show empirically that causal models trained to approach but not exceed estimated per-token entropies exhibit greater generalization than models trained without taking entropy into account.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.10618

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Joint Entropy Search for Multi-Objective Bayesian Optimization Ben T u

Neural Information Processing SystemsAug-14-2025, 09:42:50 GMT

Many real-world problems can be phrased as a multi-objective optimization problem, where the goal is to identify the best set of compromises between the competing objectives.

acquisition function, bayesian optimization, optimization, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.15)
North America > Panama (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Measuring Grammatical Diversity from Small Corpora: Derivational Entropy Rates, Mean Length of Utterances, and Annotation Invariance

Martin, Fermin Moscoso del Prado

arXiv.org Artificial IntelligenceDec-8-2024

In many fields, such as language acquisition, neuropsychology of language, the study of aging, and historical linguistics, corpora are used for estimating the diversity of grammatical structures that are produced during a period by an individual, community, or type of speakers. In these cases, treebanks are taken as representative samples of the syntactic structures that might be encountered. Generalizing the potential syntactic diversity from the structures documented in a small corpus requires careful extrapolation whose accuracy is constrained by the limited size of representative sub-corpora. In this article, I demonstrate -- theoretically, and empirically -- that a grammar's derivational entropy and the mean length of the utterances (MLU) it generates are fundamentally linked, giving rise to a new measure, the derivational entropy rate. The mean length of utterances becomes the most practical index of syntactic complexity; I demonstrate that MLU is not a mere proxy, but a fundamental measure of syntactic diversity. In combination with the new derivational entropy rate measure, it provides a theory-free assessment of grammatical complexity. The derivational entropy rate indexes the rate at which different grammatical annotation frameworks determine the grammatical complexity of treebanks. I introduce the Smoothed Induced Treebank Entropy (SITE) as a tool for estimating these measures accurately, even from very small treebanks. I conclude by discussing important implications of these results for both NLP and human language processing.

artificial intelligence, entropy, natural language, (12 more...)

arXiv.org Artificial Intelligence

2412.06095

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.45)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(17 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Reviews: Entropy Rate Estimation for Markov Chains with Large State Space

Neural Information Processing SystemsOct-7-2024, 21:00:17 GMT

The paper proposes an entropy estimate for Markov chains by reduction to optimal entropy estimation for i.i.d samples. Sample complexity analysis is provided for different mixing scenarios with a minimax rate established for a particular rate. The estimator is used to assess the capacity of language models. This is a very clear and well-written paper. I appreciate the efforts done by the authors to summarize the results.

entropy estimate, entropy rate estimation, markov chain, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Ensemble weighted kernel estimators for multivariate entropy estimation

Neural Information Processing SystemsMar-14-2024, 12:51:57 GMT

The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy.

estimation, estimator, weighted estimator, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

k-Means Maximum Entropy Exploration

Nedergaard, Alexander, Cook, Matthew

arXiv.org Artificial IntelligenceNov-7-2023

Exploration in high-dimensional, continuous spaces with sparse rewards is an open problem in reinforcement learning. Artificial curiosity algorithms address this by creating rewards that lead to exploration. Given a reinforcement learning algorithm capable of maximizing rewards, the problem reduces to finding an optimization objective consistent with exploration. Maximum entropy exploration uses the entropy of the state visitation distribution as such an objective. However, efficiently estimating the entropy of the state visitation distribution is challenging in high-dimensional, continuous spaces. We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution. The bound relies on a result we prove for non-parametric density estimation in arbitrary dimensions using k-means. We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces, especially on tasks where reinforcement learning algorithms are unable to find rewards.

entropy, exploration, voronoi diagram, (14 more...)

arXiv.org Artificial Intelligence

2205.15623

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.63)

Add feedback

Joint Entropy Search for Multi-objective Bayesian Optimization

Tu, Ben, Gandy, Axel, Kantas, Nikolas, Shafei, Behrang

arXiv.org Artificial IntelligenceOct-6-2022

Many real-world problems can be phrased as a multi-objective optimization problem, where the goal is to identify the best set of compromises between the competing objectives. Multi-objective Bayesian optimization (BO) is a sample efficient strategy that can be deployed to solve these vector-valued optimization problems where access is limited to a number of noisy objective function evaluations. In this paper, we propose a novel information-theoretic acquisition function for BO called Joint Entropy Search (JES), which considers the joint information gain for the optimal set of inputs and outputs. We present several analytical approximations to the JES acquisition function and also introduce an extension to the batch setting.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2210.02905

Genre:

Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Confidence intervals for nonparametric regression

Barrera, David

arXiv.org Machine LearningMar-20-2022

We demonstrate and discuss nonasymptotic bounds in probability for the cost of a regression scheme with a general loss function from the perspective of the Rademacher theory, and for the optimality with respect to the average $L^{2}$-distance to the underlying conditional expectations of least squares regression outcomes from the perspective of the Vapnik-Chervonenkis theory. The results follow from an analysis involving independent but possibly nonstationary training samples and can be extended, in a manner that we explain and illustrate, to relevant cases in which the training sample exhibits dependence.

inequality, probability, random element, (16 more...)

arXiv.org Machine Learning

2203.10643

Country:

South America > Colombia > Bogotá D.C. > Bogotá (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Generalization bounds for nonparametric regression with $\beta-$mixing samples

Barrera, David, Gobet, Emmanuel

arXiv.org Machine LearningAug-2-2021

In this paper we present a series of results that permit to extend in a direct manner uniform deviation inequalities of the empirical process from the independent to the dependent case characterizing the additional error in terms of $\beta-$mixing coefficients associated to the training sample. We then apply these results to some previously obtained inequalities for independent samples associated to the deviation of the least-squared error in nonparametric regression to derive corresponding generalization bounds for regression schemes in which the training sample may not be independent. These results provide a framework to analyze the error associated to regression schemes whose training sample comes from a large class of $\beta-$mixing sequences, including geometrically ergodic Markov samples, using only the independent case. More generally, they permit a meaningful extension of the Vapnik-Chervonenkis and similar theories for independent training samples to this class of $\beta-$mixing samples.

application, inequality, sequence, (16 more...)

arXiv.org Machine Learning

2108.00997

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > France (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Entropy from Machine Learning

Janik, Romuald A.

arXiv.org Machine LearningSep-24-2019

Subsequently, one can use virtually any machine learning classification algorithm for computing entropy. This procedure can be used to compute entropy, and consequently the free energy directly from a set of Monte Carlo configurations at a given temperature. As a test of the proposed method, using an off-the-shelf machine learning classifier we reproduce the entropy and free energy of the 2D Ising model from Monte Carlo configurations at various temperatures throughout its phase diagram. Other potential applications include computing the entropy of spiking neurons or any other multidimensional binary signals. 1 Introduction The problem of estimating entropy of high dimensional binary configurations or signals is ubiquitous in many disciplines. In physics, we very often have at our disposal a set of configurations of some physical system generated by a Monte Carlo simulation at a given temperature T 0. This data is very much geared towards computing expectation values of various operators or their correlation functions, however obtaining the entropy or free energy of the system is far from trivial. Indeed, to the best of our knowledge, there is no known way to compute the entropy directly from these configurations even for a system of a quite moderate size (e.g. for a 20 20 lattice). The goal of the present paper is to propose machine email: romuald.janik@gmail.com

classifier, configuration, entropy, (16 more...)

arXiv.org Machine Learning

1909.10831

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Poland (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback