AITopics

2605.08069

Genre: Research Report > Experimental Study (0.94)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
(2 more...)

arXiv.org Machine LearningMay-5-2026

Stable Blanket with Hidden Variables and Cycles

Xiang, Hanqing

Stabilized regression aims to identify a set of predictors whose conditional relationship with a response variable remains invariant across different environments. Existing graphical characterizations of the stable blanket are mainly developed for structural causal models (SCMs) without hidden variables or causal cycles. However, latent variables and feedback relationships naturally arise in many applications, and they can change both the Markov blanket and the set of predictors that remain stable under interventions. This paper studies stable blankets in graphical causal models with hidden variables, causal cycles, and both features simultaneously. For models with hidden variables, we use acyclic directed mixed graphs (ADMGs) and $m$-separation to characterize the Markov blanket and to construct intervention-stable predictor sets. We introduce the notion of an intervened sub-district and use it to describe how interventions may affect districts connected to the response. For models with cycles, we work with directed graphs (DGs) and directed mixed graphs (DMGs) together with $σ$-separation, treating strongly connected components (SCCs) as the basic graphical units. We then combine these ideas to analyze models with both hidden variables and cycles. The main results give graphical characterizations of Markov blankets, stable frontiers, and stable blankets in these generalized settings. In particular, we identify conditions under which the response is conditionally independent of intervention variables given a suitable predictor set, and we describe when such sets are minimal or unique. These results extend the graphical interpretation of stabilized regression beyond acyclic fully observed models.

artificial intelligence, machine learning, scc, (16 more...)

2605.01856

Country: Europe > Sweden (0.40)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.86)

Agazzi, Andrea, Bruno, Giuseppe, García, Eloy Mosig, Saviozzi, Samuele, Romito, Marco

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

arXiv.org Machine LearningApr-30-2026

The transformer architecture [52], which underlies present-day Large Language Models, has been one of the main drivers of recent advances in machine learning and artificial intelligence. At each layer, the hidden state of the network is updated by sequentially applying two distinct operations: attention modules [3], which capture long-range interactions in the input sequence, and classical MultiLayer Perceptrons (MLPs), acting separately on each element of that sequence. Despite their empirical success, the mechanisms governing information propagation through depth, and the way attention and MLP blocks jointly shape internal representations, remain only partially understood from a theoretical viewpoint. Recent progress has come from viewing transformers in suitable scaling limits as deterministic mean-field interacting particle systems modeling the evolution of N tokens1 through the layers of the neural network architecture (the so-called residual stream dynamics), see, among others, [46, 26, 27, 45]. In these descriptions, depth plays the role of a continuous time variable, and, in the large-context regime (N), the evolution of token representations is encoded by a PDE for their empirical distribution. This viewpoint is closely connected to the literature on scaling laws, where the effect of various scaling exponents controlling the relative size of the network's hyperparameters (e.g., depth, width, context length) on the effective dynamics of the model

lemma 2, machine learning, natural language, (19 more...)

2604.26898

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Neural Information Processing SystemsApr-25-2026, 03:29:33 GMT

Supplementary Material MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps

Specifically, robustness with only ACM loss is 48.38%, the addition of soft-labels improves it to 49.53%, the addition of mixup improves it to 52.29%, and the addition of both of these components make final robustness to 56.65%. Also, note that only soft labels are not enough to transfer robustness in this case, as shown by KDOnly column. This is in line with the observations of Goldblum et al. [4]. A.4.2 Role of Intermediate Features To understand the role of low, mid, and high-level features, we performed experiments on CIFAR-10 by progressively changing blocks used for distillation. For this ablation study, we kept all the standard settings reported in the Section A.1. Our correspondence of blocks and features is as follows: block 2: low-level features; block 3: mid-level features; block 4: high-level features. Please note that block 1 corresponds to the output of the first layer only. Therefore, we do not call it low-level features.

artificial intelligence, machine learning, robustness, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsApr-24-2026, 10:09:31 GMT

Optimistic Rates for Multi-Task Representation Learning

We study the problem of transfer learning via Multi-Task Representation Learning (MTRL), wherein multiple source tasks are used to learn a good common representation, and a predictor is trained on top of it for the target task. Under standard regularity assumptions on the loss function and task diversity, we provide new statistical rates on the excess risk of the target task, which demonstrate the benefit of representation learning. Importantly, our rates are optimistic, i.e., they interpolate between the standard O(m 1/2)rate and the fast O(m 1)rate, depending on the difficulty of the learning task, where m is the number of samples for the target task. Besides the main result, we make several new contributions, including giving optimistic rates for excess risk of source tasks (Multi-Task Learning (MTL)), a local Rademacher complexity theorem for MTRL and MTL, as well as a chain rule for local Rademacher complexity for composite predictor classes.

artificial intelligence, complexity, machine learning, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsApr-24-2026, 10:09:27 GMT

Optimistic Rates for Multi-Task Representation Learning

artificial intelligence, complexity, machine learning, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Slavutsky, Yuli, Salazar, Sebastian, Blei, David M.

Neural Generalized Mixed-Effects Models

arXiv.org Machine LearningApr-14-2026

Generalized linear mixed-effects models (GLMMs) are widely used to analyze grouped and hierarchical data. In a GLMM, each response is assumed to follow an exponential-family distribution where the natural parameter is given by a linear function of observed covariates and a latent group-specific random effect. Since exact marginalization over the random effects is typically intractable, model parameters are estimated by maximizing an approximate marginal likelihood. In this paper, we replace the linear function with neural networks. The result is a more flexible model, the neural generalized mixed-effects model (NGMM), which captures complex relationships between covariates and responses. To fit NGMM to data, we introduce an efficient optimization procedure that maximizes the approximate marginal likelihood and is differentiable with respect to network parameters. We show that the approximation error of our objective decays at a Gaussian-tail rate in a user-chosen parameter. On synthetic data, NGMM improves over GLMMs when covariate-response relationships are nonlinear, and on real-world datasets it outperforms prior methods. Finally, we analyze a large dataset of student proficiency to demonstrate how NGMM can be extended to more complex latent-variable models.

artificial intelligence, machine learning, mixed-effect model, (18 more...)

2604.10976

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Tennessee (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-19-2026, 06:21:54 GMT

ARanking-based,BalancedLossFunction Unifying ClassificationandLocalisationinObjectDetection

Our contributions are: (1) We develop a generalized framework to optimize non-differentiable ranking-based functions byextending theerror-drivenoptimization ofAPLoss.(2)Weprovethat

artificial intelligence, detection, machine learning, (15 more...)