AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Neural Information Processing SystemsFeb-13-2026, 21:02:34 GMT

Entropy Rate Estimation for Markov Chains with Large State Space

Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, Tiancheng Yu

Entropy estimation is one of the prototypical problems in distribution property testing.

artificial intelligence, machine learning, markov chain, (17 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)

Neural Information Processing SystemsNov-20-2025, 18:37:29 GMT

Entropy Rate Estimation for Markov Chains with Large State Space

Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, Tiancheng Yu

Entropy estimation is one of the prototypical problems in distribution property testing.

artificial intelligence, machine learning, markov chain, (15 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

arXiv.org Artificial IntelligenceNov-4-2025

Reversal Invariance in Autoregressive Language Models

Sahasrabudhe, Mihir

We formalize a structural property of the causal (autoregressive) language modeling (CLM) objective: reversal invariance. Formally, the next-token prediction loss assigns identical likelihood to a corpus and its reversal, implying that standard CLM pretraining is direction-blind. This symmetry explains why models trained on reversed text can achieve comparable performance to those trained on forward text, despite the inherently time-asymmetric nature of human language and reasoning. We argue that this invariance represents a limitation of current pretraining objectives rather than a benign artifact. If natural language encodes directional dependencies - phonological, morphological, or causal - a symmetric objective may fail to capture them. We therefore propose viewing pretraining through the lens of temporal asymmetry, motivating future work on loss functions and architectures that explicitly model the arrow of language while retaining standard language modeling capacity.

artificial intelligence, large language model, natural language, (16 more...)

2511.00341

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Kodama, Nathan X., Hinczewski, Michael

Thermodynamic Performance Limits for Score-Based Diffusion Models

arXiv.org Artificial IntelligenceOct-8-2025

We establish a fundamental connection between score-based diffusion models and non-equilibrium thermodynamics by deriving performance limits based on entropy rates. Our main theoretical contribution is a lower bound on the negative log-likelihood of the data that relates model performance to entropy rates of diffusion processes. We numerically validate this bound on a synthetic dataset and investigate its tightness. By building a bridge to entropy rates - system, intrinsic, and exchange entropy - we provide new insights into the thermodynamic operation of these models, drawing parallels to Maxwell's demon and implications for thermodynamic computing hardware. Our framework connects generative modeling performance to fundamental physical principles through stochastic thermodynamics.

artificial intelligence, entropy rate, machine learning, (15 more...)

2510.06174

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Machine LearningAug-28-2025

The Information Dynamics of Generative Diffusion

Ambrogioni, Luca

Generative diffusion models have emerged as a powerful class of models in machine learning, yet a unified theoretical understanding of their operation is still developing. This perspective paper provides an integrated perspective on generative diffusion by connecting their dynamic, information-theoretic, and thermodynamic properties under a unified mathematical framework. We demonstrate that the rate of conditional entropy production during generation (i.e. the generative bandwidth) is directly governed by the expected divergence of the score function's vector field. This divergence, in turn, is linked to the branching of trajectories and generative bifurcations, which we characterize as symmetry-breaking phase transitions in the energy landscape. This synthesis offers a powerful insight: the process of generation is fundamentally driven by the controlled, noise-induced breaking of (approximate) symmetries, where peaks in information transfer correspond to critical transitions between possible outcomes. The score function acts as a dynamic non-linear filter that regulates the bandwidth of the noise by suppressing fluctuations that are incompatible with the data.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

arXiv.org Machine Learning

2508.19897

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Riechers, Paul M., Bigelow, Henry R., Alt, Eric A., Shai, Adam

Next-token pretraining implies in-context learning

arXiv.org Artificial IntelligenceJul-15-2025

We argue that in-context learning (ICL) predictably arises from standard self-supervised next-token pretraining, rather than being an exotic emergent property. This work establishes the foundational principles of this emergence by focusing on in-distribution ICL, demonstrating how models necessarily adapt to context when trained on token sequences, especially from non-ergodic sources. Our information-theoretic framework precisely predicts these in-distribution ICL dynamics (i.e., context-dependent loss reduction). We verify this with experiments using synthetic datasets of differing types of correlational structure, reproducing characteristic phenomena like phase transitions in training loss for induction head formation and power-law scaling of in-context loss. We further show that a model's in-context performance on any task is mathematically coupled to the ensemble of tasks seen in pretraining, offering a fundamental explanation, grounded in architecture- and modality-independent principles, for such inference-time learning.

large language model, machine learning, natural language, (19 more...)

2505.18373

Country: North America > United States (0.67)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Jingjing Zheng, Zhuolin Jiang, Rama Chellappa, Jonathon P. Phillips

Submodular Attribute Selection for Action Recognition in Video

Neural Information Processing SystemsFeb-9-2025, 16:46:07 GMT

In real-world action recognition problems, low-level features cannot adequately characterize the rich spatial-temporal structures in action videos. In this work, we encode actions based on attributes that describes actions as high-level concepts e.g., jump forward or motion in the air. We base our analysis on two types of action attributes. One type of action attributes is generated by humans. The second type is data-driven attributes, which are learned from data using dictionary learning methods.

artificial intelligence, machine learning, representation, (19 more...)

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)

Industry: Leisure & Entertainment > Sports > Track & Field (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceAug-10-2024

An Information-Theoretic Analysis of Temporal GNNs

Farzaneh, Amirmohammad

Temporal Graph Neural Networks, a new and trending area of machine learning, suffers from a lack of formal analysis. In this paper, information theory is used as the primary tool to provide a framework for the analysis of temporal GNNs. For this reason, the concept of information bottleneck is used and adjusted to be suitable for a temporal analysis of such networks. To this end, a new definition for Mutual Information Rate is provided, and the potential use of this new metric in the analysis of temporal GNNs is studied.

amir, information, mutual information, (14 more...)

2408.05624

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > Singapore (0.04)
North America > United States (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Neural Information Processing SystemsMar-13-2024, 23:17:21 GMT

Spike train entropy-rate estimation using hierarchical Dirichlet process priors

For spiking neurons, the entropy rate places an upper bound on the rate at which the spike train can convey stimulus information, and a large literature has focused on the problem of estimating entropy rate from spike train data.

estimator, probability, transition probability, (13 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)