AITopics | entropy function

Collaborating Authors

entropy function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Correlated Mutations for Integer Programming

Shir, Ofer M., Emmerich, Michael

arXiv.org Artificial IntelligenceSep-5-2025

Even with the recent theoretical advancements that dramatically reduced the complexity of Integer Programming (IP), heuristics remain the dominant problem-solvers for this difficult category. This study seeks to establish the groundwork for Integer Evolution Strategies (IESs), a class of randomized search heuristics inherently designed for continuous spaces. IESs already excel in treating IP in practice, but accomplish it via discretization and by applying sophisticated patches to their continuous operators, while persistently using the $\ell_2$-norm as their operation pillar. We lay foundations for discrete search, by adopting the $\ell_1$-norm, accounting for the suitable step-size, and questioning alternative measures to quantify correlations over the integer lattice. We focus on mutation distributions for unbounded integer decision variables. We briefly discuss a couple of candidate discrete probabilities induced by the uniform and binomial distributions, which we show to possess less appealing theoretical properties, and then narrow down to the Truncated Normal (TN) and Double Geometric (DG) distributions. We explore their theoretical properties, including entropy functions, and propose a procedure to generate scalable correlated mutation distributions. Our investigations are accompanied by extensive numerical simulations, which consistently support the claim that the DG distribution is better suited for unbounded integer search. We link our theoretical perspective to empirical evidence indicating that an IES with correlated DG mutations outperformed other strategies over non-separable quadratic IP. We conclude that while the replacement of the default TN distribution by the DG is theoretically justified and practically beneficial, the truly crucial change lies in adopting the $\ell_1$-norm over the $\ell_2$-norm.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3729878.3746698

2506.22526

Country:

Europe (0.46)
North America > United States (0.15)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)

Add feedback

Provable Uncertainty Decomposition via Higher-Order Calibration

Ahdritz, Gustaf, Gollakota, Aravind, Gopalan, Parikshit, Peale, Charlotte, Wieder, Udi

arXiv.org Machine LearningDec-25-2024

We give a principled method for decomposing the predictive uncertainty of a model into aleatoric and epistemic components with explicit semantics relating them to the real-world data distribution. While many works in the literature have proposed such decompositions, they lack the type of formal guarantees we provide. Our method is based on the new notion of higher-order calibration, which generalizes ordinary calibration to the setting of higher-order predictors that predict mixtures over label distributions at every point. We show how to measure as well as achieve higher-order calibration using access to $k$-snapshots, namely examples where each point has $k$ independent conditional labels. Under higher-order calibration, the estimated aleatoric uncertainty at a point is guaranteed to match the real-world aleatoric uncertainty averaged over all points where the prediction is made. To our knowledge, this is the first formal guarantee of this type that places no assumptions whatsoever on the real-world data distribution. Importantly, higher-order calibration is also applicable to existing higher-order predictors such as Bayesian and ensemble models and provides a natural evaluation metric for such models. We demonstrate through experiments that our method produces meaningful uncertainty decompositions for image classification.

artificial intelligence, calibration, machine learning, (19 more...)

arXiv.org Machine Learning

2412.18808

Country: Europe (0.45)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

On Calibration in Multi-Distribution Learning

Verma, Rajeev, Fischer, Volker, Nalisnick, Eric

arXiv.org Artificial IntelligenceDec-18-2024

Modern challenges of robustness, fairness, and decision-making in machine learning have led to the formulation of multi-distribution learning (MDL) frameworks in which a predictor is optimized across multiple distributions. We study the calibration properties of MDL to better understand how the predictor performs uniformly across the multiple distributions. Through classical results on decomposing proper scoring losses, we first derive the Bayes optimal rule for MDL, demonstrating that it maximizes the generalized entropy of the associated loss function. Our analysis reveals that while this approach ensures minimal worst-case loss, it can lead to non-uniform calibration errors across the multiple distributions and there is an inherent calibration-refinement trade-off, even at Bayes optimality. Our results highlight a critical limitation: despite the promise of MDL, one must use caution when designing predictors tailored to multiple distributions so as to minimize disparity.

artificial intelligence, machine learning, predictor, (18 more...)

arXiv.org Artificial Intelligence

2412.14142

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New York (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

MEP-Net: Generating Solutions to Scientific Problems with Limited Knowledge by Maximum Entropy Principle

Yang, Wuyue, Peng, Liangrong, Li, Guojie, Hong, Liu

arXiv.org Machine LearningDec-2-2024

Maximum entropy principle (MEP) offers an effective and unbiased approach to inferring unknown probability distributions when faced with incomplete information, while neural networks provide the flexibility to learn complex distributions from data. This paper proposes a novel neural network architecture, the MEP-Net, which combines the MEP with neural networks to generate probability distributions from moment constraints. We also provide a comprehensive overview of the fundamentals of the maximum entropy principle, its mathematical formulations, and a rigorous justification for its applicability for non-equilibrium systems based on the large deviations principle. Through fruitful numerical experiments, we demonstrate that the MEP-Net can be particularly useful in modeling the evolution of probability distributions in biochemical reaction networks and in generating complex distributions from data.

constraint, mep-net, probability distribution, (13 more...)

arXiv.org Machine Learning

2412.0209

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Fujian Province > Fuzhou (0.04)

Genre:

Overview (0.86)
Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Entropy, Thermodynamics and the Geometrization of the Language Model

Yang, Wenzhe

arXiv.org Artificial IntelligenceJul-30-2024

In this paper, we discuss how pure mathematics and theoretical physics can be applied to the study of language models. Using set theory and analysis, we formulate mathematically rigorous definitions of language models, and introduce the concept of the moduli space of distributions for a language model. We formulate a generalized distributional hypothesis using functional analysis and topology. We define the entropy function associated with a language model and show how it allows us to understand many interesting phenomena in languages. We argue that the zero points of the entropy function and the points where the entropy is close to 0 are the key obstacles for an LLM to approximate an intelligent language model, which explains why good LLMs need billions of parameters. Using the entropy function, we formulate a conjecture about AGI. Then, we show how thermodynamics gives us an immediate interpretation to language models. In particular we will define the concepts of partition function, internal energy and free energy for a language model, which offer insights into how language models work. Based on these results, we introduce a general concept of the geometrization of language models and define what is called the Boltzmann manifold. While the current LLMs are the special cases of the Boltzmann manifold.

entropy function, geometrization, language model, (13 more...)

arXiv.org Artificial Intelligence

2407.21092

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces

Neumayer, Sebastian, Stein, Viktor, Steidl, Gabriele

arXiv.org Artificial IntelligenceFeb-7-2024

Most commonly used $f$-divergences of measures, e.g., the Kullback-Leibler divergence, are subject to limitations regarding the support of the involved measures. A remedy consists of regularizing the $f$-divergence by a squared maximum mean discrepancy (MMD) associated with a characteristic kernel $K$. In this paper, we use the so-called kernel mean embedding to show that the corresponding regularization can be rewritten as the Moreau envelope of some function in the reproducing kernel Hilbert space associated with $K$. Then, we exploit well-known results on Moreau envelopes in Hilbert spaces to prove properties of the MMD-regularized $f$-divergences and, in particular, their gradients. Subsequently, we use our findings to analyze Wasserstein gradient flows of MMD-regularized $f$-divergences. Finally, we consider Wasserstein gradient flows starting from empirical measures and provide proof-of-the-concept numerical examples with Tsallis-$\alpha$ divergences.

artificial intelligence, divergence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.04613

Country:

Europe > United Kingdom (0.14)
Europe > Germany (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Add feedback

The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs

Amid, Ehsan, Nielsen, Frank, Nock, Richard, Warmuth, Manfred K.

arXiv.org Machine LearningNov-22-2023

Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities. Calculus on TEMs relies on a deformed algebra of arithmetic operators induced by the deformed logarithms used to define the tempered entropy. In this work, we introduce three different parameterizations of finite discrete TEMs via Legendre functions of the negative tempered entropy function. In particular, we establish an isometry between such parameterizations in terms of a generalization of the Hilbert log cross-ratio simplex distance to a tempered Hilbert co-simplex distance. Similar to the Hilbert geometry, the tempered Hilbert distance is characterized as a $t$-symmetrization of the oriented tempered Funk distance. We motivate our construction by introducing the notion of $t$-lengths of smooth curves in a tautological Finsler manifold. We then demonstrate the properties of our generalized structure in different settings and numerically examine the quality of its differentiable approximations for optimization in machine learning settings.

artificial intelligence, geometry, machine learning, (16 more...)

arXiv.org Machine Learning

2311.13459

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Entropy is a measure of uncertainty

#artificialintelligenceJun-19-2022, 09:05:15 GMT

Suppose you are talking with three patients in the waiting room of a doctor's office. All three of them have just completed a medical test which, after some processing, yields one of two possible results: the disease is either present or absent. Let's assume we are dealing with curious and data-oriented individuals. They've researched the probabilities for their specific risk profiles in advance and are now eager to find out the result. Patient A knows that, statistically, there is a 95% chance that he has the disease in question.

entropy, probability, shannon entropy, (16 more...)

#artificialintelligence

Country: Europe > Greece (0.05)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

On the algebraic structures of the space of interval-valued intuitionistic fuzzy numbers

Wu, Xinxing, Tan, Chaoyue, Cayli, Gul Deniz, Liu, Peide

arXiv.org Artificial IntelligenceDec-1-2021

This study is inspired by those of Huang et al. (Soft Comput. 25, 2513--2520, 2021) and Wang et al. (Inf. Sci. 179, 3026--3040, 2009) in which some ranking techniques for interval-valued intuitionistic fuzzy numbers (IVIFNs) were introduced. In this study, we prove that the space of all IVIFNs with the relation in the method for comparing any two IVIFNs based on a score function and three types of entropy functions is a complete chain and obtain that this relation is an admissible order. Moreover, we demonstrate that IVIFNs are complete chains to the relation in the comparison method for IVIFNs on the basis of score, accuracy, membership uncertainty index, and hesitation uncertainty index functions.

accuracy function, definition 3, ivifn, (15 more...)

arXiv.org Artificial Intelligence

2112.03026

Country:

Asia > China > Guangdong Province > Zhuhai (0.04)
North America > United States > Rhode Island (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.33)

Add feedback

Conditional independence structures over four discrete random variables revisited: conditional Ingleton inequalities

Studeny, Milan

arXiv.org Artificial IntelligenceDec-9-2020

The paper deals with conditional linear information inequalities valid for entropy functions induced by discrete random variables. Specifically, the so-called conditional Ingleton inequalities are in the center of interest: these are valid under conditional independence assumptions on the inducing random variables. We discuss five inequalities of this particular type, four of which has appeared earlier in the literature. Besides the proof of the new fifth inequality, simpler proofs of (some of) former inequalities are presented. These five information inequalities are used to characterize all conditional independence structures induced by four discrete random variables.

ci structure, implication, inequality, (15 more...)

arXiv.org Artificial Intelligence

2012.04092

Country:

Europe > Czechia > Prague (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)

Add feedback