AITopics | sup 2

Appendix614 Table of Contents

Neural Information Processing SystemsApr-29-2026, 20:36:19 GMT

Incorporating causality into reinforcement learning methods increases the interpretability of artificial636 intelligence, which helps humans understand the underlying mechanism of algorithms and check637 the source of failures. However, the learned causal transition model may contain human-readable638 private information about the environment, which could raise privacy issues. To mitigate this potential639 negative societal impact, the causal transition model needs to be encrypted and only accessible to640 algorithms and trustworthy users.641 In this section, besides the most related formulation, robust RL introduced in Sec 3.3, we also643 introduce some other related RL problem formulations partially shown in Figure 3. Then, we limit644 our discussion to mainly two lines of work that are related to ours: (1) promoting robustness in RL;645 (2) concerning the spurious correlation issues in RL.646 B.1 Related RL formulations647 Robustness to noisy state: POMDPs and SA-MDPs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Collection (0.40)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

a17251f8d595179eef5e466b1f5f7a85-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 06:02:37 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

a17251f8d595179eef5e466b1f5f7a85-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 06:02:33 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

c1285fcadc52c0d3dc8813fc2c2e2b2a-Paper.pdf

Neural Information Processing SystemsFeb-13-2026, 23:01:48 GMT

Con set isa (scaled)`p ball, 1 p< 2, then, considering(2), the thebestmethod p d/logdtimeslar weprovidene . Formally, if 2 then s 2 { 1}d implies (sj j)j d 2 .

artificial intelligence, regretn, sup 2, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

3d7d9461075eb7c37fbbfcad1d7042c1-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 07:46:10 GMT

confidence region, equation, estimator, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Coneheads: Hierarchy Aware Attention

Neural Information Processing SystemsOct-9-2025, 03:14:18 GMT

These networks rely heavily on the dot product attention operator, which computes the similarity between two points by taking their inner product. However, the inner product does not explicitly model the complex structural properties of real world datasets, such as hierarchies between data points.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Coneheads: Hierarchy Aware Attention

Neural Information Processing SystemsOct-9-2025, 03:14:15 GMT

These networks rely heavily on the dot product attention operator, which computes the similarity between two points by taking their inner product. However, the inner product does not explicitly model the complex structural properties of real world datasets, such as hierarchies between data points.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Appendix

Neural Information Processing SystemsAug-22-2025, 01:31:26 GMT

The appendix is organized as follows. In Appendix B we bound the Hessian of the network and introduce some technical lemmas. In Appendix D we put the aforementioned results together to prove Theorem 3.5. In Appendix E we explain the merit of Assumption 3.6. In Appendix F we describe the details of our experiments with a link to the relevant code. Our approach to generalization will be based on metric entropy (see, e.g., Wainwright, 2019), a We recall some basic definitions.

artificial intelligence, machine learning, probability, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning

Oh, Jeongheon, Lee, Kibok

arXiv.org Machine LearningJun-16-2024

Supervised contrastive representation learning has been shown to be effective in various transfer learning scenarios. However, while asymmetric non-contrastive learning (ANCL) often outperforms its contrastive learning counterpart in self-supervised representation learning, the extension of ANCL to supervised scenarios is less explored. To bridge the gap, we study ANCL for supervised representation learning, coined SupSiam and SupBYOL, leveraging labels in ANCL to achieve better representations. The proposed supervised ANCL framework improves representation learning while avoiding collapse. Our analysis reveals that providing supervision to ANCL reduces intra-class variance, and the contribution of supervision should be adjusted to achieve the best performance. Experiments demonstrate the superiority of supervised ANCL across various datasets and tasks. The code is available at: https://github.com/JH-Oh-23/Sup-ANCL.

dataset, learning, representation, (15 more...)

arXiv.org Machine Learning

2406.10815

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Loss Gradient Gaussian Width based Generalization and Optimization Guarantees

Banerjee, Arindam, Li, Qiaobo, Zhou, Yingxue

arXiv.org Artificial IntelligenceJun-11-2024

Generalization and optimization guarantees on the population loss in machine learning often rely on uniform convergence based analysis, typically based on the Rademacher complexity of the predictors. The rich representation power of modern models has led to concerns about this approach. In this paper, we present generalization and optimization guarantees in terms of the complexity of the gradients, as measured by the Loss Gradient Gaussian Width (LGGW). First, we introduce generalization guarantees directly in terms of the LGGW under a flexible gradient domination condition, which we demonstrate to hold empirically for deep models. Second, we show that sample reuse in finite sum (stochastic) optimization does not make the empirical gradient deviate from the population gradient as long as the LGGW is small. Third, focusing on deep networks, we present results showing how to bound their LGGW under mild assumptions. In particular, we show that their LGGW can be bounded (a) by the $L_2$-norm of the loss Hessian eigenvalues, which has been empirically shown to be $\tilde{O}(1)$ for commonly used deep models; and (b) in terms of the Gaussian width of the featurizer, i.e., the output of the last-but-one layer. To our knowledge, our generalization and optimization guarantees in terms of LGGW are the first results of its kind, avoid the pitfalls of predictor Rademacher complexity based analysis, and hold considerable promise towards quantitatively tight bounds for deep models.

arxiv, gradient, probability, (14 more...)

arXiv.org Artificial Intelligence

2406.07712

Country: