AITopics

2605.30059

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Jacobs, Tijn, van Wieringen, Wessel N., van der Pas, Stéphanie L.

Horseshoe Forests for High-Dimensional Causal Survival Analysis

arXiv.org Machine LearningMay-8-2026

We develop a Bayesian tree ensemble model to estimate heterogeneous treatment effects in censored survival data with high-dimensional covariates. Instead of imposing sparsity through the tree structure, we place a horseshoe prior directly on the step heights to achieve adaptive global-local shrinkage. This strategy allows flexible regularisation and reduces noise. We develop a reversible jump Gibbs sampler to accommodate the non-conjugate horseshoe prior within the tree ensemble framework. We show through extensive simulations that the method accurately estimates treatment effects in high-dimensional covariate spaces, at various sparsity levels, and under non-linear treatment effect functions. We further illustrate the practical utility of the proposed approach by a re-analysis of pancreatic ductal adenocarcinoma (PDAC) survival data from The Cancer Genome Atlas.

artificial intelligence, bayesian inference, machine learning, (19 more...)

2507.22004

Country:

Europe (0.46)
North America > United States (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.54)
Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Neural Information Processing SystemsMar-17-2026, 19:55:55 GMT

Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders

Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of those activations. We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. In SAEs, the L1 penalty used to encourage sparsity introduces many undesirable biases, such as shrinkage -- systematic underestimation of feature activations. The key insight of Gated SAEs is to separate the functionality of (a) determining which directions to use and (b) estimating the magnitudes of those directions: this enables us to apply the L1 penalty only to the former, limiting the scope of undesirable side effects. Through training SAEs on LMs of up to 7B parameters we find that, in typical hyper-parameter ranges, Gated SAEs solve shrinkage, are similarly interpretable, and require half as many firing features to achieve comparable reconstruction fidelity.

artificial intelligence, machine learning, proceedings, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Neural Information Processing SystemsMar-17-2026, 12:28:58 GMT

Generalized Linear Model Regression under Distance-to-set Penalties

Estimation in generalized linear models (GLM) is complicated by the presence of constraints. One can handle constraints by maximizing a penalized log-likelihood. Penalties such as the lasso are effective in high dimensions but often lead to severe shrinkage. This paper explores instead penalizing the squared distance to constraint sets. Distance penalties are more flexible than algebraic and regularization penalties, and avoid the drawback of shrinkage. To optimize distance penalized objectives, we make use of the majorization-minimization principle. Resulting algorithms constructed within this framework are amenable to acceleration and come with global convergence guarantees. Applications to shape constraints, sparse regression, and rank-restricted matrix regression on synthetic and real data showcase the strong empirical performance of distance penalization, even under non-convex constraints.

artificial intelligence, machine learning, proceedings, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Arnstad, August, Rønneberg, Leiv, Storvik, Geir

Dirichlet Scale Mixture Priors for Bayesian Neural Networks

arXiv.org Machine LearningFeb-24-2026

Neural networks are the cornerstone of modern machine learning, yet can be difficult to interpret, give overconfident predictions and are vulnerable to adversarial attacks. Bayesian neural networks (BNNs) provide some alleviation of these limitations, but have problems of their own. The key step of specifying prior distributions in BNNs is no trivial task, yet is often skipped out of convenience. In this work, we propose a new class of prior distributions for BNNs, the Dirichlet scale mixture (DSM) prior, that addresses current limitations in Bayesian neural networks through structured, sparsity-inducing shrinkage. Theoretically, we derive general dependence structures and shrinkage results for DSM priors and show how they manifest under the geometry induced by neural networks. In experiments on simulated and real world data we find that the DSM priors encourages sparse networks through implicit feature selection, show robustness under adversarial attacks and deliver competitive predictive performance with substantially fewer effective parameters. In particular, their advantages appear most pronounced in correlated, moderately small data regimes, and are more amenable to weight pruning. Moreover, by adopting heavy-tailed shrinkage mechanisms, our approach aligns with recent findings that such priors can mitigate the cold posterior effect, offering a principled alternative to the commonly used Gaussian priors.

artificial intelligence, machine learning, neural network, (19 more...)

2602.19859

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Norway > Eastern Norway > Oslo (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Neural Information Processing SystemsFeb-9-2026, 12:17:50 GMT

66738d21d3cddb8717ca52deff5a5546-Supplemental-Conference.pdf

ascal voc 2012, input image, localization map, (11 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Manolakis, Efstratios, Bongiorno, Christian, Mantegna, Rosario Nunzio

Physics-Informed Singular-Value Learning for Cross-Covariances Forecasting in Financial Markets

arXiv.org Machine LearningJan-22-2026

A new wave of work on covariance cleaning and nonlinear shrinkage has delivered asymptotically optimal analytical solutions for large covariance matrices. The same framework has been generalized to empirical cross-covariance matrices, whose singular value decomposition identifies canonical comovement modes between two asset sets, with singular values quantifying the strength of each mode and providing natural targets for shrinkage. Existing analytical cross-covariance cleaners are derived under strong stationarity and large-sample assumptions, and they typically rely on mesoscopic regularity conditions such as bounded spectra; macroscopic common modes (e.g., a global market factor) violate these conditions. When applied to real equity returns, where dependence structures drift over time and global modes are prominent, we find that these theoretically optimal formulas do not translate into robust out-of-sample performance. We address this gap by designing a random-matrix-inspired neural architecture that operates in the empirical singular-vector basis and learns a nonlinear mapping from empirical singular values to their corresponding cleaned values. By construction, the network can recover the analytical solution as a special case, yet it remains flexible enough to adapt to non-stationary dynamics and mode-driven distortions. Trained on a long history of equity returns, the proposed method achieves a more favorable bias-variance trade-off than purely analytical cleaners and delivers systematically lower out-of-sample cross-covariance prediction errors. Our results demonstrate that combining random-matrix theory with machine learning makes asymptotic theories practically effective in realistic time-varying markets.

artificial intelligence, machine learning, singular value, (17 more...)

2601.07687

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Polson, Nick, Sokolov, Vadim

Horseshoe Mixtures-of-Experts (HS-MoE)

arXiv.org Machine LearningJan-15-2026

Horseshoe mixtures-of-experts (HS-MoE) models provide a Bayesian framework for sparse expert selection in mixture-of-experts architectures. We combine the horseshoe prior's adaptive global-local shrinkage with input-dependent gating, yielding data-adaptive sparsity in expert usage. Our primary methodological contribution is a particle learning algorithm for sequential inference, in which the filter is propagated forward in time while tracking only sufficient statistics. We also discuss how HS-MoE relates to modern mixture-of-experts layers in large language models, which are deployed under extreme sparsity constraints (e.g., activating a small number of experts per token out of a large pool).

machine learning, natural language, particle, (16 more...)

2601.09043

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Neural Information Processing SystemsDec-23-2025, 20:07:06 GMT

Modular Meta-Learning with Shrinkage

Many real-world problems, including multi-speaker text-to-speech synthesis, can greatly benefit from the ability to meta-learn large models with only a few task-specific components. Updating only these task-specific modules then allows the model to be adapted to low-data tasks for as many steps as necessary without risking overfitting. Unfortunately, existing meta-learning methods either do not scale to long adaptation or else rely on handcrafted task-specific architectures. Here, we propose a meta-learning approach that obviates the need for this often sub-optimal hand-selection. In particular, we develop general techniques based on Bayesian shrinkage to automatically discover and learn both task-specific and general reusable modules. Empirically, we demonstrate that our method discovers a small set of meaningful task-specific modules and outperforms existing meta-learning approaches in domains like few-shot text-to-speech that have little task data and long adaptation horizons. We also show that existing meta-learning methods including MAML, iMAML, and Reptile emerge as special cases of our method.

modular meta-learning, name change, shrinkage, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.84)

Neural Information Processing SystemsNov-21-2025, 14:12:21 GMT

Generalized Linear Model Regression under Distance-to-set Penalties

constraint, generalized linear model regression, name change, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)