AITopics | cmi

Appendices A Some Useful Lemmas

Neural Information Processing SystemsFeb-16-2026, 02:50:34 GMT

In this paper, there are some equivalent forms of the generalization error we will study, e.g., Eq. (2) This lemma is a consequence of Lemma 2.1, with further utilizing some symmetric properties. Recall Eq. (1) in Lemma 2.1, E Note that Eq. (2) in the main text is from the second equation above, which is used to derive individual Notice that we do not change the definitions of any the random variable, e.g., This, as we have already seen in Eq. (5) in the main text, is used to derive hypotheses-conditioned CMI bounds in Section 4. It's easy to see that when To obtain Eq. (14), we let W This is used to derive supersample-conditioned CMI bounds in Section 4. It's easy to see that both Like all the previous information-theoretic bounds, the following lemma is widely used in our paper. We also invoke some other lemmas as given below. It's easy to verify that We note that the reason we introduce four types of SCH stability in Definition 2.1 is that solely using The basic set up is as follows. By Lemma A.3, we have E Recall Eq. (12) in Lemma A.1 and applying Jensen's inequality to the absolute function, the first The proof is nearly the same to the proof of Theorem 3.1, except that now the randomness of the algorithm is given for each DV auxiliary function, so the randomness of Similar to the proof of Theorem 3.1, we let We now prove the first bound. Lemma A.2, we have E By Lemma A.3, we have E Recall Eq. (14) in Lemma A.1 and by Jensen's inequality for the absolute function, the first bound is To prove the second bound, we return to Eq. (20), and take expectation over For the second part of Theorem 4.1, notice that it's valid to let The proof is similar to [18, Theorem 2.1].

artificial intelligence, inequality, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

9b912f91a5e299472764377db6ca2431-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 02:50:31 GMT

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

5d632bbfde3f580d8183dc48ed87b418-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 17:23:38 GMT

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

ddbc86dc4b2fbfd8a62e12096227e068-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 12:17:21 GMT

algorithm, compression scheme, vc class, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

TowardsaUnified Information-Theoretic FrameworkforGeneralization

Neural Information Processing SystemsFeb-11-2026, 12:17:17 GMT

Let D be an unknown distribution on a spaceZ, and let H be a set of classifiers. Consider a (randomized) learning algorithmA = (An)n 1 that selects an elementˆh in H, based onn i.i.d.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

48db67447e92539501bd71645ff33b72-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 08:55:31 GMT

cmi, dataset, estimator, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > New Zealand (0.04)
North America > United States > North Carolina (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)

Add feedback

SHAQ: IncorporatingShapleyValueTheoryinto Multi-AgentQ-Learning

Neural Information Processing SystemsFeb-7-2026, 23:54:18 GMT

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however, its underlying mechanism is not yet fully understood.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > France (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Separation-Utility Pareto Frontier: An Information-Theoretic Characterization

Xu, Shizhou

arXiv.org Machine LearningFeb-6-2026

We study the Pareto frontier (optimal trade-off) between utility and separation, a fairness criterion requiring predictive independence from sensitive attributes conditional on the true outcome. Through an information-theoretic lens, we prove a characterization of the utility-separation Pareto frontier, establish its concavity, and thereby prove the increasing marginal cost of separation in terms of utility. In addition, we characterize the conditions under which this trade-off becomes strict, providing a guide for trade-off selection in practice. Based on the theoretical characterization, we develop an empirical regularizer based on conditional mutual information (CMI) between predictions and sensitive attributes given the true outcome. The CMI regularizer is compatible with any deep model trained via gradient-based optimization and serves as a scalar monitor of residual separation violations, offering tractable guarantees during training. Finally, numerical experiments support our theoretical findings: across COMPAS, UCI Adult, UCI Bank, and CelebA, the proposed method substantially reduces separation violations while matching or exceeding the utility of established baseline methods. This study thus offers a provable, stable, and flexible approach to enforcing separation in deep learning.

data mining, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2602.04408

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
(2 more...)

Add feedback

Towards a Unified Information-Theoretic Framework for Generalization

Neural Information Processing SystemsDec-25-2025, 01:20:19 GMT

In this work, we investigate the expressiveness of the conditional mutual information (CMI) framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs hypotheses from a class of bounded VC dimension. We then explore two directions of strengthening this bound: (i) Can the CMI framework express optimal bounds for VC classes?

generalization, name change, unified information-theoretic framework, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tighter CMI-Based Generalization Bounds via Stochastic Projection and Quantization

Sefidgaran, Milad, Nadjahi, Kimia, Zaidi, Abdellatif

arXiv.org Machine LearningOct-28-2025

In this paper, we leverage stochastic projection and lossy compression to establish new conditional mutual information (CMI) bounds on the generalization error of statistical learning algorithms. It is shown that these bounds are generally tighter than the existing ones. In particular, we prove that for certain problem instances for which existing MI and CMI bounds were recently shown in Attias et al. [2024] and Livni [2023] to become vacuous or fail to describe the right generalization behavior, our bounds yield suitable generalization guarantees of the order of $\mathcal{O}(1/\sqrt{n})$, where $n$ is the size of the training dataset. Furthermore, we use our bounds to investigate the problem of data "memorization" raised in those works, and which asserts that there are learning problem instances for which any learning algorithm that has good prediction there exist distributions under which the algorithm must "memorize" a big fraction of the training dataset. We show that for every learning algorithm, there exists an auxiliary algorithm that does not memorize and which yields comparable generalization error for any data distribution. In part, this shows that memorization is not necessary for good generalization.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

2510.23485

Country: Europe (0.45)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Filters

Collaborating Authors

cmi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Appendices A Some Useful Lemmas

9b912f91a5e299472764377db6ca2431-Paper-Conference.pdf

5d632bbfde3f580d8183dc48ed87b418-Paper-Conference.pdf

ddbc86dc4b2fbfd8a62e12096227e068-Supplemental.pdf

TowardsaUnified Information-Theoretic FrameworkforGeneralization

48db67447e92539501bd71645ff33b72-Paper-Conference.pdf

SHAQ: IncorporatingShapleyValueTheoryinto Multi-AgentQ-Learning

Separation-Utility Pareto Frontier: An Information-Theoretic Characterization

Towards a Unified Information-Theoretic Framework for Generalization

Tighter CMI-Based Generalization Bounds via Stochastic Projection and Quantization