AITopics | entropy

Collaborating Authors

entropy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Topological Stability Index: A Variance-Based Measure for Persistence Barcodes

Kirchner, Joris, Diamantis, Ioannis

arXiv.org Machine LearningMay-29-2026

We introduce the \emph{Topological Stability Index} (TSI), a variance-based scalar measure for persistence barcodes that quantifies the dispersion of persistence lifetimes. Unlike persistent entropy, which depends only on normalized weights, the TSI captures absolute variability and is sensitive to heterogeneous feature scales. We establish fundamental properties of the TSI, including its scaling behavior, invariance under lifetime translation and explicit update formulas under insertion and deletion of bars. We also consider a complementary first-moment-type quantity, the Topological Signal Index (TSigI), which captures the typical scale of persistence lifetimes and provides additional interpretability alongside the TSI. We further introduce a normalized version, $cv\text{TSI}$, which is scale invariant and admits an explicit algebraic relation to the Rényi entropy of order two. In particular, $cv\text{TSI}$ is an affine function of the collision probability $\sum_i p_i^2$, and therefore a monotone reparametrization of the Rényi entropy, providing a direct link between variance-based and entropy-based summaries in topological data analysis. Numerical experiments on synthetic data and stochastic time series demonstrate that the TSI captures structural variability complementary to entropy: it is relatively insensitive to deterministic trends, while responding strongly to stochastic fluctuations and variations in persistence magnitude.

artificial intelligence, barcode, tsi, (15 more...)

arXiv.org Machine Learning

2605.29839

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination

Agnimo, Yedidia, Korba, Anna, Blangero, Annabelle, Chesneau, Nicolas, Alahari, Karteek

arXiv.org Machine LearningMay-27-2026

Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable deployment. In parallel, numerous uncertainty estimation (UE) methods have been proposed to quantify model confidence and are often implicitly treated as proxies for model failure. However, the relationship between uncertainty and hallucinations remains insufficiently characterized. We present a systematic empirical study of the association between uncertainty estimators and hallucinations in LLMs. Rather than assuming this association, we evaluate directly when and to what extent it holds. We consider a diverse set of uncertainty estimators, including information-theoretic, sampling-based, and reflexive estimators, and examine their behavior across hallucination settings. Our experiments cover both intrinsic hallucinations (violations of input faithfulness) and extrinsic hallucinations (unsupported claims relative to training data), using four complementary benchmarks, including RAGTruth and HalluLens. We find that the association is highly variable and often weak, depending on the hallucination type and the LLM under evaluation. These results challenge the use of uncertainty as a direct signal of hallucination and clarify when it provides actionable information.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2605.27016

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning

Tran, Kim Phuc

arXiv.org Machine LearningMay-25-2026

Deep learning is increasingly viewed as a dynamical process in parameter space, yet many existing theories still treat training as a closed optimization system. This view is limited for real-world AI, where models operate under uncertainty, resource constraints, distribution shift, downstream decision risks, and human feedback. We propose Human-Centered Learning Mechanics (HCLM), a dynamical and information-theoretic framework for open and controlled learning systems. The central idea is that entropy regularization is useful only when the chosen entropy surrogate generates a non-degenerate information force along the optimization trajectory. Otherwise, entropy terms may produce weak, unstable, or misaligned gradients, causing the dynamics to collapse toward ordinary loss minimization. We introduce the notion of effective entropy and study tractable geometric entropy surrogates, including variance-based and log-determinant covariance proxies. The paper makes three contributions. First, it formalizes entropy regularization through effective information force and characterizes degenerate entropy regimes. Second, it derives convergence, entropy-flow, Wasserstein-gradient-flow, and noisy-representation generalization results under explicit assumptions. Third, it offers a conditional dynamical interpretation of scaling-law-like behavior as a balance between information injection, entropy dissipation, and residual risk, without claiming an unconditional derivation of empirical neural scaling laws. Controlled representation-learning experiments support the hypothesis that geometric entropy surrogates, especially log-determinant covariance entropy, induce stronger and more stable information forces than softmax-normalized entropy.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2605.2294

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

The Thermodynamic Costs of Simple Linear Regression

D'Ambrosia, Samuel H., Daniels, Sultan M., DeWeese, Michael R., Sahai, Anant

arXiv.org Machine LearningMay-20-2026

The construction of models from data is a significant contributor to the energetic costs of computation. Because of this, understanding how foundational thermodynamic bounds apply to modeling algorithms will be increasingly important. Here, we study the thermodynamic costs of a basic and fundamental modeling algorithm: simple linear regression. Following Landauer, we approximate the thermodynamic lower bound on irreversibly performing both exact linear regression and linear regression via stochastic gradient descent as implemented on floating-point numbers. From this, we derive energycost aware scaling laws for the optimal dataset size for training a linear regression model given a generalization error dependent demand for inference. Additionally, we discuss a method to lower bound the entropy production from the mismatch cost for algorithms with continuous input variables.

artificial intelligence, entropy, machine learning, (17 more...)

arXiv.org Machine Learning

2605.19195

Country: North America > United States > California (0.28)

Genre:

Research Report (0.82)
Workflow (0.67)

Industry: Energy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Understanding Self-Supervised Learning via Latent Distribution Matching

Mikulasch, Fabian A, Zenke, Friedemann

arXiv.org Machine LearningMay-19-2026

Self-supervised learning (SSL) excels at finding general-purpose latent representations from complex data, yet lacks a unifying theoretical framework that explains the diverse existing methods and guides the design of new ones. We cast SSL as latent distribution matching (LDM): learning representations that maximize their log-probability under an assumed latent model (alignment), while maximizing latent entropy to prevent collapse (uniformity). This view unifies independent component analysis with contrastive, non-contrastive, and predictive SSL methods, including stop gradient approaches. Leveraging LDM, we derive a nonlinear, sampling-free Bayesian filtering model with a Kalman-based predictor for high-dimensional timeseries. We further prove that predictive LDM yields identifiable latent representations under mild assumptions, even with nonlinear predictors. Overall, LDM clarifies the assumptions behind established SSL methods and provides principled guidance for developing new approaches.

artificial intelligence, inductive learning, machine learning, (19 more...)

arXiv.org Machine Learning

2605.03517

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Inducing Spatial Locality in Vision Transformers through the Training Protocol

Toledo, Eduardo Santiago, Martínez, Asael Fabian

arXiv.org Machine LearningMay-19-2026

We investigate whether the training protocol can induce spatial locality in the early layers of a Vision Transformer (ViT) trained from scratch, without large-scale pretraining. Keeping the architecture and optimization procedure fixed, we compare a Baseline protocol with a Modern protocol (AutoAugment/ColorJitter, CutMix, and Label Smoothing) on CIFAR-10, CIFAR-100, and Tiny-ImageNet, characterizing each attention head via Mean Attention Distance (MAD) and normalized entropy. Across all three datasets, the Modern protocol produces more local and more concentrated attention in early layers; on CIFAR-100, the minimum MAD drops from 0.316 (Baseline) to 0.008 (Modern). To identify the source of this effect, we conduct an ablation study on CIFAR-100 by adding or removing each component individually. The results identify CutMix as the determining component within our experiments: all conditions with CutMix exhibit MAD 0.024, while all conditions without CutMix remain at MAD 0.210. AutoAugment and Label Smoothing show no independent effect on locality. Taken together, these findings suggest that the pressure to classify from partial image regions, induced by CutMix, can promote the emergence of local attention in Vision Transformers.

artificial intelligence, machine learning, protocol, (16 more...)

arXiv.org Machine Learning

2605.1639

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.76)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Breaking the Finite-Sample Barrier in Entropy Coupling

Asoodeh, Shahab, Chen, Jun

arXiv.org Machine LearningMay-18-2026

Dependence among marginally constrained observations can break a finite-sample barrier. To formalize this phenomenon, we introduce the \emph{minimum list entropy coupling} $H(P\|Q_1,\dots,Q_m)$, the minimum conditional entropy $H(X|Y_1,\dots,Y_m)$ over all joint distributions with prescribed discrete marginals $X\sim P$ and $Y_i\sim Q_i$. Unlike classical formulations based on independent observations, our model allows $Y_1,\dots,Y_m$ to be arbitrarily dependent while keeping each marginal fixed. This enlarged coupling space reveals a sharp dichotomy: independent observations reduce residual uncertainty exponentially, whereas dependent observations can eliminate it exactly after finitely many samples. We characterize this zero-entropy regime through necessary and sufficient conditions and give concrete structural criteria under which it occurs. In particular, under mild support assumptions, zero entropy is achieved with $O(\log(1/P_{\min}))$ observations, where $P_{\min}$ is the minimum nonzero mass of $P$. We also develop a greedy algorithm with monotone approximation guarantees for computing $H(P\|Q_1,\dots,Q_m)$. Finally, we show that the same framework formalizes finite-sample limits in distribution-matching representation learning and randomness extraction, where zero entropy corresponds to exact recovery and exact extraction.

artificial intelligence, coupling, machine learning, (19 more...)

arXiv.org Machine Learning

2605.16229

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

FRESH: Information-Geometric Calibration of Patient-Level Models to Aggregate Evidence

Fuller, Franklin, Bertolini, Daniele, Liang, Samantha, Christopher, Jason, Smith, Aaron M.

arXiv.org Machine LearningMay-18-2026

Many decision in clinical science and epidemiology -- estimating probability of technical success for a clinical trial, assessing comparative effectiveness of two therapies, imputing a placebo effect onto natural history data -- rely on combining sources of information about a clinical cohort that comes from different kinds of studies. Specifically we contrast patient-level sources that provide granular pictures of individual disease course (clinical trial, registries, or electronic health records) with aggregate sources such as published clinical trial results and the TFLs (tables figures and listings). One strategy for combining aggregate with patient-level data sources is to bring each into a common format for a unified analysis. If one wants to maintain the analytic flexibility of patient-level data, then a natural solution is to convert the aggregate data information into a simulated patient-level dataset that recapitulate those aggregate statistics. This is an under-determined inverse problem in that there are many such datasets, and it cannot be well specified without further constraints. FRESH(Fusion of Recent Evidence with Subject Histories) provides a well-defined method for solving this problem, and therefore providing maximal analytic flexibility.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Machine Learning

2605.16246

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

A Mutual Information Lower Bound for Multimodal Regression Active Learning

Guilhoto, Leonardo Ferreira, Kaushal, Akshat, Perdikaris, Paris

arXiv.org Machine LearningMay-15-2026

Active learning for continuous regression has lacked an acquisition function that targets epistemic uncertainty when the predictive distribution is multimodal: variance misses modal disagreement, and information-theoretic targets like BALD are designed for discrete outputs. We introduce a Two-Index framework that makes this separation explicit: one stochastic index selects among competing model hypotheses (epistemic source), while a second governs within-hypothesis randomness (aleatoric source). An entropy decomposition within the framework identifies the mutual information between the output and the epistemic index as a principled acquisition objective, and we prove this quantity vanishes as the model is trained on growing datasets, confirming that it captures exactly the uncertainty data can resolve. Because this mutual information is intractable for continuous outputs, we derive the Mutual Information Lower Bound (MI-LB) acquisition function, a closed-form approximation for Mixture Density Network ensembles. On benchmarks featuring multimodal systems, MI-LB matches or beats every baseline evaluated and is the only method to do so consistently -- geometric and Fisher-based baselines compete only when the input space already encodes the multimodality, and collapse otherwise.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

2605.14917

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information

van Buuren, Stef

arXiv.org Machine LearningMay-14-2026

Large language models (LLMs) are increasingly deployed in settings where the available context is incomplete or degraded. We argue that an LLM generating answers under incomplete context can be viewed as an implicit imputer, and evaluated against a criterion from the multiple imputation (MI) literature: uncertainty should scale with the amount of missing information. We assess this criterion on SQuAD, using a controlled framework in which context availability is varied across five levels. We evaluate two answer-level uncertainty measures that can be estimated from repeated sampling: sampling-based confidence (empirical mode frequency) and response entropy. Confidence fails to reflect increasing missingness: it remains high even as accuracy collapses. Entropy, by contrast, increases with context removal, consistent with the MI analogy, and explains substantially more variance in accuracy than confidence across all evidence levels (quadratic $R^2$ gap up to 0.057). We further introduce a black-box diagnostic $ρ_R(α)$ that estimates the proportion of baseline uncertainty resolved by context level $α$, requiring only repeated sampling with and without context. These results suggest that entropy is a more responsive black-box uncertainty measure than confidence under incomplete context.

information, large language model, natural language, (20 more...)

arXiv.org Machine Learning

2605.13188

Country:

North America > United States (0.30)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Football (0.49)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback