AITopics | information theory

Collaborating Authors

information theory

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Information from coincidences

Balsubramani, Akshay

arXiv.org Machine LearningJun-25-2026

We prove a single algebraic mixed coincidence identity that unifies a broad swath of information-theoretic variational results. For any family of priors $\{π_i\}$ and real exponents $\{ α_i \}$, the log of the mixed count $E_{x\simν}\!\left[\prod_{i=1}^W π_i^{α_i}(x)\right]$ is simultaneously a Boltzmann coincidence weight, an exponential-family normalizer, a maximum-entropy value, and a KL-barycenter optimum. The identity yields a unified derivation of classical cornerstones of information theory: concentration of empirical distributions (Sanov-type decompositions and Gibbs conditioning), hypothesis-testing error exponents (Chernoff information and its multi-way analogue), change-of-measure inequalities (Donsker-Varadhan and PAC-Bayes), and laws governing rare-pattern coincidences (Erdos-Renyi run-length, iterative guesswork, rate-distortion, and birthday thresholds). Each is recovered as a specialization of the same algebraic equality. It strictly generalizes the classical Renyi entropy and divergence variational formulas (one and two priors respectively) to a $W$-prior simplex, and holds for unnormalized and continuum-indexed priors. Among its consequences are an exact multi-prior PAC-Bayes penalty that subtracts an explicit "coincidence bonus" from the usual single-prior posterior penalty, and the asymptotic MAP error exponent for $W$-ary hypothesis testing as an edge-restricted simplex optimum. We demonstrate the calculus at scale on two large alphabets encoding richly modeled sequential languages: on language-model next-token predictives where we recover contrastive decoding, and on human genomic regulatory sequence where it separates correlated from diverse prior families along a sliding-window trace.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2606.25042

Country: Europe (0.45)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Leisure & Entertainment (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)

Add feedback

Want to beat Wordle? Try a 1940s mathematical theory.

Technology Want to beat Wordle? Try a 1940s mathematical theory. A new strategy found the correct word 99 percent of the time. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Wordle is currently celebrating its fifth anniversary and a team from Binghamton University has a new way to solve the fun word game.

artificial intelligence, physics popular science video space, wordle, (9 more...)

Popular Science

Country: North America > United States > New York > Broome County > Binghamton (0.26)

Genre: Research Report > New Finding (0.30)

Industry: Leisure & Entertainment > Games > Computer Games (0.90)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Optimal Neural Compressors for the Rate-Distortion-Perception Tradeoff

Neural Information Processing SystemsJun-15-2026, 18:56:58 GMT

Recent efforts in neural compression have focused on the rate-distortion-perception (RDP) tradeoff, where the perception constraint ensures the source and reconstruction distributions are close in terms of a statistical divergence. Theoretical work on RDP describes properties of RDP-optimal compressors without providing constructive and low complexity solutions. While classical rate-distortion theory shows that optimal compressors should efficiently pack space, RDP theory additionally shows that infinite randomness shared between the encoder and decoder may be necessary for RDP optimality. In this paper, we propose neural compressors that are low complexity and benefit from high packing efficiency through lattice coding and shared randomness through shared dithering over the lattice cells. For two important settings, namely infinite shared and zero shared randomness, we analyze the RDP tradeoff achieved by our proposed neural compressors and show optimality in both cases. Experimentally, we investigate the roles that these two components of our design, lattice coding and randomness, play in the performance of neural compressors on synthetic and real-world data. We observe that performance improves with more shared randomness and better lattice packing.

artificial intelligence, latexit sha1, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry: Banking & Finance (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDSChannel

Neural Information Processing SystemsJun-14-2026, 23:01:37 GMT

With the emergence of new storage and communication methods, the insertion, deletion, and substitution (IDS) channel has attracted considerable attention. However, many topics on the IDS channel and the associated Levenshtein distance remain open, making the invention of a novel IDS-correcting code a hard task.

codeword, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.64)

Add feedback

Information-Theoretic Bounds for Sparse Covariance Estimation in the Vertical-Split Distributed Model

Tan, Jing Yee, Han, Guangyue

arXiv.org Machine LearningJun-8-2026

We study the minimax estimation error for distributed covariance matrix estimation in the vertical-split (feature-split) setting, where two agents each observe different coordinates of $m$ i.i.d. sub-Gaussian samples and communicate a limited number of bits to a central server. While Rahmani et al. [2025] established nearly tight bounds for dense (unstructured) cross-covariance matrices, we investigate whether imposing elementwise $s$-sparsity on the cross-covariance $C_{21}$ can reduce the required communication and sample complexity. In contrast to the horizontal-split setting, where Braverman et al. [2016] showed that sparsity does not reduce communication cost for mean estimation, we prove that sparsity does help for cross-covariance estimation in the vertical split. Specifically, we establish minimax lower bounds showing that the communication budget per agent scales as $B_k = Ω(σ^4 d_k\, s' \log(d_1 d_2/s')/\varepsilon^2)$ and the sample complexity for cross-covariance estimation as $m = Ω(σ^4\, s' \log(d_1 d_2/s')/\varepsilon^2)$, where $s' = s \wedge d_{\min}$. For the $1$-sparse case, this yields an exponential improvement from $d_1 d_2$ to $\log(d_1 d_2)$ compared to the dense rate. Our lower bounds are established via Fano's method with an explicit sparse packing using a Varshamov--Gilbert-type argument for signed partial permutation matrices combined with the Conditional Strong Data Processing Inequality of Rahmani et al. [2025]. We show the bounds are tight with a matching achievable scheme, based on covering-net quantization and entry-wise hard thresholding, that attains the $s$-sparse lower bound up to polylogarithmic factors.

artificial intelligence, estimation, machine learning, (18 more...)

arXiv.org Machine Learning

2606.07124

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Breaking the Finite-Sample Barrier in Entropy Coupling

Asoodeh, Shahab, Chen, Jun

arXiv.org Machine LearningMay-18-2026

Dependence among marginally constrained observations can break a finite-sample barrier. To formalize this phenomenon, we introduce the \emph{minimum list entropy coupling} $H(P\|Q_1,\dots,Q_m)$, the minimum conditional entropy $H(X|Y_1,\dots,Y_m)$ over all joint distributions with prescribed discrete marginals $X\sim P$ and $Y_i\sim Q_i$. Unlike classical formulations based on independent observations, our model allows $Y_1,\dots,Y_m$ to be arbitrarily dependent while keeping each marginal fixed. This enlarged coupling space reveals a sharp dichotomy: independent observations reduce residual uncertainty exponentially, whereas dependent observations can eliminate it exactly after finitely many samples. We characterize this zero-entropy regime through necessary and sufficient conditions and give concrete structural criteria under which it occurs. In particular, under mild support assumptions, zero entropy is achieved with $O(\log(1/P_{\min}))$ observations, where $P_{\min}$ is the minimum nonzero mass of $P$. We also develop a greedy algorithm with monotone approximation guarantees for computing $H(P\|Q_1,\dots,Q_m)$. Finally, we show that the same framework formalizes finite-sample limits in distribution-matching representation learning and randomness extraction, where zero entropy corresponds to exact recovery and exact extraction.

artificial intelligence, coupling, machine learning, (19 more...)

arXiv.org Machine Learning

2605.16229

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

Stabilizing Private LASSO under Heterogeneous Covariates via Anisotropic Objective Perturbation

Tanzawa, Haruka, Sakata, Ayaka

arXiv.org Machine LearningMay-5-2026

We study high-dimensional LASSO under differential privacy via objective perturbation with heterogeneous covariate scales. In practical scenarios, covariates often exhibit diverse scales; however, standard preprocessing is problematic under privacy constraints, as it consumes additional privacy budget. This heterogeneity induces effective anisotropy in the objective perturbation via the inverse Gram matrix of covariates, which can degrade the stability and accuracy of algorithms. To address this, we propose a Gram-based anisotropic objective perturbation, a ``pre-distortion" strategy that counteracts the distortion from the covariate structure to restore isotropy in the estimation process. Using an Approximate Message Passing (AMP) framework and state evolution analysis, we demonstrate that our proposed perturbation significantly stabilizes convergence and improves both statistical efficiency and privacy performance compared to standard uniform noise injection. Our results provide theoretical insights into designing stable and efficient private estimators without relying on data-dependent preprocessing.

artificial intelligence, machine learning, perturbation, (15 more...)

arXiv.org Machine Learning

2605.01492

Country:

North America > United States (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

MIM4DD: Mutual Information Maximization for Dataset Distillation

Neural Information Processing SystemsApr-25-2026, 23:44:20 GMT

Dataset distillation (DD) aims to synthesize a small dataset whose test performance is comparable to a full dataset using the same model. State-of-the-art (SoTA) methods optimize synthetic datasets primarily by matching heuristic indicators extracted from two networks: one from real data and one from synthetic data (see Figure 1, Left), such as gradients and training trajectories. DD is essentially a compression problem that emphasizes maximizing the preservation of information contained in the data. We argue that well-defined metrics which measure the amount of shared information between variables in information theory are necessary for success measurement but are never considered by previous works. Thus, we introduce mutual information (MI) as the metric to quantify the shared information between the synthetic and the real datasets, and devise MIM4DD numerically maximizing the MI via a newly designed optimizable objective within a contrastive learning framework to update the synthetic dataset. Specifically, we designate the samples in different datasets that share the same labels as positive pairs and vice versa negative pairs. Then we respectively pull and push those samples in positive and negative pairs into contrastive space via minimizing NCE loss. As a result, the targeted MI can be transformed into a lower bound represented by feature maps of samples, which is numerically feasible. Experiment results show that MIM4DD can be implemented as an add-on module to existing SoTADD methods.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology: