AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Appendix Conditional Independence Dependence in 10H and

Neural Information Processing SystemsApr-25-2026, 03:09:30 GMT

We investigate the degree to which our conditional independence assumption is satisfied empirically in the datasets used in the paper. Specifically, of interest is the assumption of conditional independence of m(x) and h(x), given y. Assessing conditional independence is not straightforward given that m(x) is a K-dimensional real-valued vector and h(x) and yeach take one of K categorical values, with K = 10 for CIFAR-10H and K = 16 for ImageNet-16H. While there exist statistical tests for assessing conditional independence for categorical random variables, with real-valued variables the situation is less straightforward and there are multiple options such as different non-parametric tests involving different tradeoffs [Runge, 2018, Marx and Vreeken, 2019, Mukherjee et al., 2020, Berrett et al., 2020]. Given these issues we investigate the degree of conditional dependence using two relatively simple approaches. The first approach looks at the conditional mutual information (CMI) between the predicted label from the model and the predicted label from the human, conditioned on the true label.

artificial intelligence, imagenet-16h, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

234b941e88b755b7a72a1c1dd5022f30-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 03:09:27 GMT

artificial intelligence, machine learning, prediction, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

AApproximate Target Maximum Welfare Minimum Relative Entropy Equilbiria We use a Minimum Relative Entropy (RME) (also known as minimum KL divergence) Pa (a)ln

Neural Information Processing SystemsApr-25-2026, 02:58:14 GMT

This objective is similar to Maximum Entropy Correlated Equilibrium (MECE) [48], and the proofs here are similar to the framework set out there. A drawback of MECE is that it is not easy to determine the minimum p permissible. If we choose p that does not permit a valid solution, then the parameters will diverge. We can circumvent this problem by optimizing the distance to a target ˆ p. And µis for balancing the linear objective.

artificial intelligence, machine learning, payoff, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

24f420aa4c99642dbb9aae18b166bbbc-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 02:58:12 GMT

artificial intelligence, machine learning, payoff, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.47)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Mixture weights optimisation for Alpha-Divergence Variational Inference

Neural Information Processing SystemsApr-25-2026, 02:57:37 GMT

This paper focuses on α-divergence minimisation methods for Variational Inference. We consider the case where the posterior density is approximated by a mixture model and we investigate algorithms optimising the mixture weights of this mixture model by α-divergence minimisation, without any information on the underlying distribution of its mixture components parameters. The Power Descent, defined for all α = 1, is one such algorithm and we establish in our work the full proof of its convergence towards the optimal mixture weights when α < 1. Since the α-divergence recovers the widely-used exclusive Kullback-Leibler when α 1, we then extend the Power Descent to the case α = 1 and show that we obtain an Entropic Mirror Descent. This leads us to investigate the link between Power Descent and Entropic Mirror Descent: first-order approximations allow us to introduce the Rényi Descent, a novel algorithm for which we prove an O(1/N) convergence rate. Lastly, we compare numerically the behavior of the unbiased Power Descent and of the biased Rényi Descent and we discuss the potential advantages of one algorithm over the other.

artificial intelligence, descent, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.68)
Europe (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Auditing Fairness by Betting

Neural Information Processing SystemsApr-25-2026, 02:57:26 GMT

We provide practical, efficient, and nonparametric methods for auditing the fairness of deployed classification and regression models. Whereas previous work relies on a fixed-sample size, our methods are sequential and allow for the continuous monitoring of incoming data, making them highly amenable to tracking the fairness of real-world systems. We also allow the data to be collected by a probabilistic policy as opposed to sampled uniformly from the population. This enables auditing to be conducted on data gathered for another purpose. Moreover, this policy may change over time and different policies may be used on different subpopulations. Finally, our methods can handle distribution shift resulting from either changes to the model or changes in the underlying population. Our approach is based on recent progress in anytime-valid inference and game-theoretic statistics--the "testing by betting" framework in particular. These connections ensure that our methods are interpretable, fast, and easy to implement. We demonstrate the efficacy of our approach on three benchmark fairness datasets.

artificial intelligence, machine learning, sequential test, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Industry:

Health & Medicine (1.00)
Government (1.00)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

231141b34c82aa95e48810a9d1b33a79-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 02:57:16 GMT

artificial intelligence, compressor, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

interpretation of regularization

Neural Information Processing SystemsApr-25-2026, 02:44:15 GMT

Blue arrows indicate node feature vectors hv of the latent space, and the orange area/point indicate possible range of graph feature vector hG obtained by applying READOUT to hv. We elaborate our motivation behind orthogonal regularization (15) proposed in Section 4.2.3. The biggest motivation behind orthognoal regularization lies in understanding (8) and (12) that the node features H becomes full rank matrix with good condition number. Figure 5 visually demonstrates the geometric effect of attention-based READOUT and orthogonal regularization with two example node features h1 and h2. Only one graph feature vector hG is possible from the combination of two node features with conventional READOUT, while vectors within the range of the orange rhombus can represent the whole graph feature with attention-based READOUT. With orthogonal regularization, area of the range that the graph feature vector hG can represent become even larger, with lower possibility of null subspace within H. Accordingly, the subspace that H can span can be rich enough.

artificial intelligence, machine learning, regularization, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis

Neural Information Processing SystemsApr-25-2026, 02:43:02 GMT

We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the WinnerTakes-All (WTA) loss for a set of hypotheses. In regression settings, the existing MCL variants focus on merging the hypotheses, thereby eventually sacrificing the diversity of the predictions. In contrast, our method relies on a novel learned scoring scheme underpinned by a mathematical framework based on Voronoi tessellations of the output space, from which we can derive a probabilistic interpretation. After empirically validating rMCL with experiments on synthetic data, we further assess its merits on the sound source localization task, demonstrating its practical usefulness and the relevance of its interpretation.

artificial intelligence, hypothesis, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Industry: Education (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

APPENDIX AOverview of group representations

Neural Information Processing SystemsApr-25-2026, 02:42:24 GMT

In this section we briefly introduce the representation theory of the three groups we used in this work. Planar rotations group SO(2) The standard representation of r 2 SO(2) is as a 2 2 rotation matrix (r)= cos sin sin cos The complex irreducible representations are often used and correspond to the circular harmonics. Planar rotations and reflections group O(2) The standard representation of O(2) is as a 2 2 orthogonal matrix (r)= cos sin sin cos and (r f)= cos sin sin cos 10 01 Apart from the trivial representation 0,0(h)=1 8h 2 O(2) and the sign-flip representation 1,0(r)=1 and 1,0(f)= 1, all other irreps are 2 dimensional. These representations are isomorphic to the Wigner D matrices. In particular, 0 is the trivial representation and i is isomorphic to the standard representation of SO(3) as 3 3 rotation matrices. An element g =( m,r) 2 O(3) is a pair of a mirroring m 2{ e,mz} and a rotation r 2 SO(3). In general, if G is a group, we denote with bG the set of its irreducible representations. Recall the generative process for cryo-EM images: oi = (g 1i) with gi 2 SO(3) (12) 14 Let Rz = SO(2) < SO(3) the subgroup of SO(3) containing rotations around the Z axis and H = O(2) < SO(3) the subgroup containing also the rotation ry by around the Y axis.

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback