AITopics

2604.03015

Country:

Africa > Rwanda > Kigali > Kigali (0.04)
North America > United States > Utah (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Boileau, Philippe, Hejazi, Nima S., Malenica, Ivana, Gilbert, Peter B., Dudoit, Sandrine, van der Laan, Mark J.

Identifying and Estimating Causal Direct Effects Under Unmeasured Confounding

arXiv.org Machine LearningApr-3-2026

Causal mediation analysis provides techniques for defining and estimating effects that may be endowed with mechanistic interpretations. With many scientific investigations seeking to address mechanistic questions, causal direct and indirect effects have garnered much attention. The natural direct and indirect effects, the most widely used among such causal mediation estimands, are limited in their practical utility due to stringent identification requirements. Accordingly, considerable effort has been invested in developing alternative direct and indirect effect decompositions with relaxed identification requirements. Such efforts often yield effect definitions with nuanced and challenging interpretations. By contrast, relatively limited attention has been paid to relaxing the identification assumptions of the natural direct and indirect effects. Motivated by a secondary aim of a recent non-randomized vaccine prospective cohort study (NCT05168813), we present a set of relaxed conditions under which the natural direct effect is identifiable in spite of unobserved baseline confounding of the exposure-mediator pathway; we use this result to investigate the effect mediated by putative immune correlates of protection. Relaxing the commonly used but restrictive cross-world counterfactual independence assumption, we discuss strategies for evaluating the natural direct effect in non-randomized settings that arise in the analysis of vaccine studies. We revisit prior studies of semi-parametric efficiency theory to demonstrate the construction of flexible, multiply robust estimators of the natural direct effect and discuss efficient estimation strategies that do not place restrictive modeling assumptions on nuisance functions.

artificial intelligence, estimator, machine learning, (18 more...)

2604.01501

Country:

Europe > Austria > Vienna (0.14)
North America > Greenland (0.05)
Africa > Southern Africa (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Vaccines (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

arXiv.org Machine LearningApr-3-2026

Test-Time Scaling Makes Overtraining Compute-Optimal

Roberts, Nicholas, Cho, Sungjun, Gao, Zhiqi, Huang, Tzu-Heng, Wu, Albert, Orlanski, Gabriel, Trost, Avi, Buchanan, Kelly, Albarghouthi, Aws, Sala, Frederic

Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.

large language model, machine learning, underreview, (17 more...)

2604.01411

Country:

Asia > Middle East > Jordan (0.04)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Chung, Neo Christopher, Laletin, Maxim

Regularizing Attention Scores with Bootstrapping

arXiv.org Machine LearningApr-3-2026

Vision transformers (ViT) rely on attention mechanism to weigh input features, and therefore attention scores have naturally been considered as explanations for its decision-making process. However, attention scores are almost always non-zero, resulting in noisy and diffused attention maps and limiting interpretability. Can we quantify uncertainty measures of attention scores and obtain regularized attention scores? To this end, we consider attention scores of ViT in a statistical framework where independent noise would lead to insignificant yet non-zero scores. Leveraging statistical learning techniques, we introduce the bootstrapping for attention scores which generates a baseline distribution of attention scores by resampling input features. Such a bootstrap distribution is then used to estimate significances and posterior probabilities of attention scores. In natural and medical images, the proposed \emph{Attention Regularization} approach demonstrates a straightforward removal of spurious attention arising from noise, drastically improving shrinkage and sparsity. Quantitative evaluations are conducted using both simulation and real-world datasets. Our study highlights bootstrapping as a practical regularization tool when using attention scores as explanations for ViT. Code available: https://github.com/ncchung/AttentionRegularization

artificial intelligence, deep learning, machine learning, (16 more...)

2604.01339

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.48)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Daily Mail - Science & techApr-2-2026, 04:01:46 GMT

Caveman casino! Humans began gambling 12,000 YEARS ago, scientists say - as they discover ancient dice in the western Great Plains

Sydney Sweeney's role is cut from The Devil Wears Prada 2 Driver who hit and killed jogger father-of-two sues victim's estate claiming incident left him with severe PTSD New'Hollywood dose' pill: A-listers hooked on'youth elixir' that dermatologists say is anti-aging, shrinks pores, smooths wrinkles... and even banishes rosacea Alarm over popular new coffee chain invading the US... as experts warn of chilling secret behind its $1.99 brew Vance grounded at White House as Iran peace talks in turmoil and Trump declares: 'I expect to be bombing' Jordon Hudson extends her control over Bill Belichick's empire with secret move that is set to leave his family and friends furious Ark of the Covenant's final resting place pinpointed by archaeologists as fresh search begins Life-threatening cantaloupe recall in four states upgraded to FDA's highest risk level... 'reasonable probability of death' Truth about your Mounjaro injection site: Our expert doctors reveal exactly where you should inject yourself for the best results, what to do if your weight loss has slowed down... and the areas you should NEVER jab Ritzy Bay Area town torn apart after teacher's daughter, 16, crashed car while speeding and killed four friends... then posted a TikTok video that poured fuel on the flames Beloved Republican mayor of small Great Plains town could be deported over'mistake' he insists was an innocent one Humiliating moment runner celebrates winning marathon... only to be pipped at the line by rival in brutal finish The new'posh' drug that's easier to order than Uber Eats - and why all my middle-class friends have ditched booze and cocaine for it: JANA HOCKING Why desperate Fergie's next move will be her biggest bombshell yet... and this is the only thing that can stop her: AMANDA PLATELL RED MORE: Man's best friend has been in Britain for 14,300 years Humans began gambling 12,000 years ago, experts say - after discovering dice that date back to the last Ice Age. A team from Colorado State University have unearthed the earliest evidence of two-sided dice crafted from small pieces of bone. They were originally found at an archaeological site on the western Great Plains of America, predating the current oldest known dice by more than 6,000 years. The discovery indicates that gambling and games of chance have been a persistent feature of North American culture since the end of the last Ice Age, experts say. 'Historians have traditionally treated dice and probability as Old World innovations,' researcher Robert Madden said.

artificial intelligence, devil wear prada 2, social media, (16 more...)

Daily Mail - Science & tech

Country:

Asia > Middle East > Iran (0.24)
North America > United States > Colorado (0.24)
North America > Canada > Alberta (0.14)
(20 more...)

Genre: Personal (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (0.88)

Cortinovis, Stefano, Aitchison, Laurence, Eleftheriadis, Stefanos, van der Wilk, Mark

Inverse-Free Sparse Variational Gaussian Processes

Gaussian processes (GPs) offer appealing properties but are costly to train at scale. Sparse variational GP (SVGP) approximations reduce cost yet still rely on Cholesky decompositions of kernel matrices, ill-suited to low-precision, massively parallel hardware. While one can construct valid variational bounds that rely only on matrix multiplications (matmuls) via an auxiliary matrix parameter, optimising them with off-the-shelf first-order methods is challenging. We make the inverse-free approach practical by proposing a better-conditioned bound and deriving a matmul-only natural-gradient update for the auxiliary parameter, markedly improving stability and convergence. We further provide simple heuristics, such as step-size schedules and stopping criteria, that make the overall optimisation routine fit seamlessly into existing workflows. Across regression and classification benchmarks, we demonstrate that our method 1) serves as a drop-in replacement in SVGP-based models (e.g., deep GPs), 2) recovers similar performance to traditional methods, and 3) can be faster than baselines when well tuned.

artificial intelligence, machine learning, r-svgp, (17 more...)

2604.00697

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Guilmeau, Thomas, Hendrikx, Hadrien, Forbes, Florence

Convergence of projected stochastic natural gradient variational inference for various step size and sample or batch size schedules

Stochastic natural gradient variational inference (NGVI) is a popular and efficient algorithm for Bayesian inference. Despite empirical success, the convergence of this method is still not fully understood. In this work, we define and study a projected stochastic NGVI when variational distributions form an exponential family. Stochasticity arises when either gradients are intractable expectations or large sums. We prove new non-asymptotic convergence results for combinations of constant or decreasing step sizes and constant or increasing sample/batch sizes. When all hyperparameters are fixed, NGVI is shown to converge geometrically to a neighborhood of the optimum, while we establish convergence to the optimum with rates of the form $\mathcal{O}\left(\frac{1}{T^ρ} \right)$, possibly with $ρ\geq 1$, for all other combinations of step size and sample/batch size schedules. These rates apply when the target posterior distribution is close in some sense to the considered exponential family. Our theoretical results extend existing NGVI and stochastic optimization results and provide more flexibility to adjust, in a principled way, step sizes and sample/batch sizes in order to meet speed, resources, or accuracy constraints.

artificial intelligence, intdoma, machine learning, (16 more...)

2604.00683

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Clivio, Oscar, D'Amour, Alexander, Franks, Alexander, Bruns-Smith, David, Holmes, Chris, Feller, Avi

Deconfounding Scores and Representation Learning for Causal Effect Estimation with Weak Overlap

Overlap, also known as positivity, is a key condition for causal treatment effect estimation. Many popular estimators suffer from high variance and become brittle when features differ strongly across treatment groups. This is especially challenging in high dimensions: the curse of dimensionality can make overlap implausible. To address this, we propose a class of feature representations called deconfounding scores, which preserve both identification and the target of estimation; the classical propensity and prognostic scores are two special cases. We characterize the problem of finding a representation with better overlap as minimizing an overlap divergence under a deconfounding score constraint. We then derive closed-form expressions for a class of deconfounding scores under a broad family of generalized linear models with Gaussian features and show that prognostic scores are overlap-optimal within this class. We conduct extensive experiments to assess this behavior empirically.

artificial intelligence, machine learning, representation, (11 more...)

2604.00811

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > North Carolina (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Bernal, Marcel Tomàs, Mallinar, Neil Rohit, Belkin, Mikhail

Breaking Data Symmetry is Needed For Generalization in Feature Learning Kernels

Grokking occurs when a model achieves high training accuracy but generalization to unseen test points happens long after that. This phenomenon was initially observed on a class of algebraic problems, such as learning modular arithmetic (Power et al., 2022). We study grokking on algebraic tasks in a class of feature learning kernels via the Recursive Feature Machine (RFM) algorithm (Radhakrishnan et al., 2024), which iteratively updates feature matrices through the Average Gradient Outer Product (AGOP) of an estimator in order to learn task-relevant features. Our main experimental finding is that generalization occurs only when a certain symmetry in the training set is broken. Furthermore, we empirically show that RFM generalizes by recovering the underlying invariance group action inherent in the data. We find that the learned feature matrices encode specific elements of the invariance group, explaining the dependence of generalization on symmetry.

artificial intelligence, machine learning, reflection, (17 more...)

2604.00316

Country:

North America > United States (0.28)
Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Machine LearningApr-1-2026

mlr3mbo: Bayesian Optimization in R

Becker, Marc, Schneider, Lennart, Binder, Martin, Kotthoff, Lars, Bischl, Bernd

We present mlr3mbo, a comprehensive and modular toolbox for Bayesian optimization in R. mlr3mbo supports single- and multi-objective optimization, multi-point proposals, batch and asynchronous parallelization, input and output transformations, and robust error handling. While it can be used for many standard Bayesian optimization variants in applied settings, researchers can also construct custom BO algorithms from its flexible building blocks. In addition to an introduction to the software, its design principles, and its building blocks, the paper presents two extensive empirical evaluations of the software on the surrogate-based benchmark suite YAHPO Gym. To identify robust default configurations for both numeric and mixed-hierarchical optimization regimes, and to gain further insights into the respective impacts of individual settings, we run a coordinate descent search over the mlr3mbo configuration space and analyze its results. Furthermore, we demonstrate that mlr3mbo achieves state-of-the-art performance by benchmarking it against a wide range of optimizers, including HEBO, SMAC3, Ax, and Optuna.

artificial intelligence, data mining, machine learning, (20 more...)

2603.2973

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Wyoming (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)