AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

18210aa6209b9adfc97b8c17c3741d95-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 09:02:22 GMT

artificial intelligence, complexity, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Adversarial Robustness with Semi-Infinite Constrained Learning

Neural Information Processing SystemsApr-25-2026, 09:02:11 GMT

Despite strong performance in numerous applications, the fragility of deep learning to input perturbations has raised serious questions about its use in safety-critical domains. While adversarial training can mitigate this issue in practice, state-ofthe-art methods are increasingly application-dependent, heuristic in nature, and suffer from fundamental trade-offs between nominal performance and robustness. Moreover, the problem of finding worst-case perturbations is non-convex and underparameterized, both of which engender a non-favorable optimization landscape. Thus, there is a gap between the theory and practice of adversarial training, particularly with respect to when and why adversarial training works. In this paper, we take a constrained learning approach to address these questions and to provide a theoretical foundation for robust learning. In particular, we leverage semi-infinite optimization and non-convex duality theory to show that adversarial training is equivalent to a statistical problem over perturbation distributions, which we characterize completely. Notably, we show that a myriad of previous robust training techniques can be recovered for particular, sub-optimal choices of these distributions. Using these insights, we then propose a hybrid Langevin Monte Carlo approach of which several common algorithms (e.g., PGD) are special cases. Finally, we show that our approach can mitigate the trade-off between nominal and robust performance, yielding state-of-the-art results on MNIST and CIFAR-10.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.68)
Health & Medicine (0.68)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

ATheory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning (Supplementary File)

Neural Information Processing SystemsApr-25-2026, 09:01:34 GMT

This supplementary document contains more additional experimental details and the technical proofs of convergence results of the NeurIPS'21 submission entitled "ATheory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning". It is structured as follows. In Appendix A, we provides more experimental details, including training algorithm, network architecture, optimizer details, loss construction and training cost of SANE. Appendix B presents the proof and details of the main results, namely, Theorem 1, in Section 2, which analyzes the generalization performance of MoCo. Next, Appendix C introduces the proof roadmap and details of the main results, i.e.

artificial intelligence, augmentation, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

310ce61c90f3a46e340ee8257bc70e93-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 09:01:31 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

310b60949d2b6096903d7e8a539b20f5-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 09:01:14 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

310b60949d2b6096903d7e8a539b20f5-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 09:01:10 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.68)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Appendix

Neural Information Processing SystemsApr-25-2026, 08:49:26 GMT

We first introduce some handy concepts and results to make the proof succinct, meanwhile providing more information for understanding our model and theory. We begin with some extended discussions on CSG. Note that a reparameterization unnecessarily has its output dimensions in S, i.e. The condition that p(y|s) = p0(y|ΦS(s,v)) for any v V does not indicate that ΦS(s,v) is constant of v, since p0(y|s0) may ignore the change of s0 = ΦS(s,v) from the change of v. The following lemma shows the meaning of a reparameterization: it allows a CSG to vary while inducing the same distribution on the observed data variables (x,y) (i.e., holding the same effect on describing data). We can now define and verify an equivalent relation on CSGs so that the resulting equivalent class contains CSGs that induce the same (x,y) data distribution and hold the same semantic information in their svariables. We say two CSGs pand p0 are semantic-equivalent, if there exists a homeomorphism11 Φ on S V, such that (i) is semantic-preserving: its output dimensions in S is constant of v, ΦS(s,v) = ΦS(s) for any v V, and (ii) it acts as a reparameterization from p to p0: Φ#[ps,v] = p0s,v, p(x|s,v) = p0(x|Φ(s,v)) and p(y|s) = p0(y|ΦS(s)). A.1 below shows that the defined binary relation is indeed an equivalence relation in common cases. As a reparameterization, Φ allows the two models to have different latent-variable parameterizations while inducing the same distribution on the observed data variables (x,y) (Lemma 9). This definition of semantic-equivalence can be rephrased as the existence of a semantic-preserving reparameterization. With proper model assumptions, we can show that any reparameterization between two CSGs is semantic-preserving, so that semantic-preserving CSGs cannot be converted to each other by a reparameterization that mixes swith v. Lemma 11. For two CSGs pand p0, if p0(y|s) has a statistics M0(s) that is an injective function of s, then any reparameterization Φ from pto p0, if exists, has its ΦS constant of v. Proof. Then the condition that p(y|s) = p0(y|ΦS(s,v)) for any v V indicates that M(s) = M0(ΦS(s,v)). If there exist s S and v(1) 6= v(2) V such that ΦS(s,v(1)) 6= ΦS(s,v(2)), then M0(ΦS(s,v(1))) 6= M0(ΦS(s,v(2))) 11A transformation is a homeomorphism if it is a continuous bijection with continuous inverse. This violates M(s) = M0(ΦS(s,v)) which requires both M0(ΦS(s,v(1))) and M0(ΦS(s,v(2))) to be equal to M(s). We then introduce two mathematical facts. Let z be a random variable on a Euclidean space RdZ with density function pz(z), and let Φ be a homeomorphism on RdZ whose inverse Φ 1 is differentiable.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology: