AITopics

2603.27189

Country: Asia > China > Henan Province > Zhengzhou (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Tinati, Mohammad, Tu, Stephen

On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry

arXiv.org Machine LearningMar-31-2026

Self-supervised pre-training, where large corpora of unlabeled data are used to learn representations for downstream fine-tuning, has become a cornerstone of modern machine learning. While a growing body of theoretical work has begun to analyze this paradigm, existing bounds leave open the question of how sharp the current rates are, and whether they accurately capture the complex interaction between pre-training and fine-tuning. In this paper, we address this gap by developing an asymptotic theory of pre-training via two-stage M-estimation. A key challenge is that the pre-training estimator is often identifiable only up to a group symmetry, a feature common in representation learning that requires careful treatment. We address this issue using tools from Riemannian geometry to study the intrinsic parameters of the pre-training representation, which we link with the downstream predictor through a notion of orbit-invariance, precisely characterizing the limiting distribution of the downstream test risk. We apply our main result to several case studies, including spectral pre-training, factor models, and Gaussian mixture models, and obtain substantial improvements in problem-specific factors over prior art when applicable.

artificial intelligence, machine learning, pre, (18 more...)

2603.27631

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Pryce, Matthew, Diaz-Ordaz, Karla, Keogh, Ruth H., Vansteelandt, Stijn

Targeted learning of heterogeneous treatment effect curves for right censored or left truncated time-to-event data

In recent years, there has been growing interest in causal machine learning estimators for quantifying subject-specific effects of a binary treatment on time-to-event outcomes. Estimation approaches have been proposed which attenuate the inherent regularisation bias in machine learning predictions, with each of these estimators addressing measured confounding, right censoring, and in some cases, left truncation. However, the existing approaches are found to exhibit suboptimal finite-sample performance, with none of the existing estimators fully leveraging the temporal structure of the data, yielding non-smooth treatment effects over time. We address these limitations by introducing surv-iTMLE, a targeted learning procedure for estimating the difference in the conditional survival probabilities under two treatments. Unlike existing estimators, surv-iTMLE accommodates both left truncation and right censoring while enforcing smoothness and boundedness of the estimated treatment effect curve over time. Through extensive simulation studies under both right censoring and left truncation scenarios, we demonstrate that surv-iTMLE outperforms existing methods in terms of bias and smoothness of time-varying effect estimates in finite samples. We then illustrate surv-iTMLE's practical utility by exploring heterogeneity in the effects of immunotherapy on survival among non-small cell lung cancer (NSCLC) patients, revealing clinically meaningful temporal patterns that existing estimators may obscure.

apreprint-march30, artificial intelligence, machine learning, (18 more...)

2603.26502

Country: Europe > Belgium > Flanders > East Flanders > Ghent (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Izbicki, Rafael, Rodrigues, Pedro L. C.

Benchmarking Tabular Foundation Models for Conditional Density Estimation in Regression

Conditional density estimation (CDE) - recovering the full conditional distribution of a response given tabular covariates - is essential in settings with heteroscedasticity, multimodality, or asymmetric uncertainty. Recent tabular foundation models, such as TabPFN and TabICL, naturally produce predictive distributions, but their effectiveness as general-purpose CDE methods has not been systematically evaluated, unlike their performance for point prediction, which is well studied. We benchmark three tabular foundation model variants against a diverse set of parametric, tree-based, and neural CDE baselines on 39 real-world datasets, across training sizes from 50 to 20,000, using six metrics covering density accuracy, calibration, and computation time. Across all sample sizes, foundation models achieve the best CDE loss, log-likelihood, and CRPS on the large majority of datasets tested. Calibration is competitive at small sample sizes but, for some metrics and datasets, lags behind task-specific neural baselines at larger sample sizes, suggesting that post-hoc recalibration may be a valuable complement. In a photometric redshift case study using SDSS DR18, TabPFN exposed to 50,000 training galaxies outperforms all baselines trained on the full 500,000-galaxy dataset. Taken together, these results establish tabular foundation models as strong off-the-shelf conditional density estimators.

artificial intelligence, machine learning, real data, (17 more...)

2603.26611

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
South America > Brazil (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Alkhatib, Yahya, Tay, Wee Peng

Contrastive Conformal Sets

Contrastive learning produces coherent semantic feature embeddings by encouraging positive samples to cluster closely while separating negative samples. However, existing contrastive learning methods lack principled guarantees on coverage within the semantic feature space. We extend conformal prediction to this setting by introducing minimum-volume covering sets equipped with learnable generalized multi-norm constraints. We propose a method that constructs conformal sets guaranteeing user-specified coverage of positive samples while maximizing negative sample exclusion. We establish theoretically that volume minimization serves as a proxy for negative exclusion, enabling our approach to operate effectively even when negative pairs are unavailable. The positive inclusion guarantee inherits the distribution-free coverage property of conformal prediction, while negative exclusion is maximized through learned set geometry optimized on a held-out training split. Experiments on simulated and real-world image datasets demonstrate improved inclusion-exclusion trade-offs compared to standard distance-based conformal baselines.

artificial intelligence, generalized, machine learning, (18 more...)

2603.26261

Country:

Asia > Singapore (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Hirari, Mehdi, Centofanti, Fabio, Hubert, Mia, Van Aelst, Stefan

Robust Tensor-on-Tensor Regression

Tensor-on-tensor (TOT) regression is an important tool for the analysis of tensor data, aiming to predict a set of response tensors from a corresponding set of predictor tensors. However, standard TOT regression is sensitive to outliers, which may be present in both the response and the predictor. It can be affected by casewise outliers, which are observations that deviate from the bulk of the data, as well as by cellwise outliers, which are individual anomalous cells within the tensors. The latter are particularly common due to the typically large number of cells in tensor data. This paper introduces a novel robust TOT regression method, named ROTOT, that can handle both types of outliers simultaneously, and can cope with missing values as well. This method uses a single loss function to reduce the influence of both casewise and cellwise outliers in the response. The outliers in the predictor are handled using a robust Multilinear Principal Component Analysis method. Graphical diagnostic tools are also proposed to identify the different types of outliers detected. The performance of ROTOT is evaluated through extensive simulations and further illustrated using the Labeled Faces in the Wild dataset, where ROTOT is applied to predict facial attributes.

artificial intelligence, data mining, machine learning, (20 more...)

2603.25911

Country:

Europe > Belgium (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Modeling & Simulation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Koptev, Pavel, Krainov, Nikita, Malkov, Konstantin, Tropsha, Alexander

KANEL: Kolmogorov-Arnold Network Ensemble Learning Enables Early Hit Enrichment in High-Throughput Virtual Screening

Machine learning models of chemical bioactivity are increasingly used for prioritizing a small number of compounds in virtual screening libraries for experimental follow-up. In these applications, assessing model accuracy by early hit enrichment such as Positive Predicted Value (PPV) calculated for top N hits (PPV@N) is more appropriate and actionable than traditional global metrics such as AUC. We present KANEL, an ensemble workflow that combines interpretable Kolmogorov-Arnold Networks (KANs) with XGBoost, random forest, and multilayer perceptron models trained on complementary molecular representations (LillyMol descriptors, RDKit-derived descriptors, and Morgan fingerprints). Across five public PubChem BioAssay datasets (AIDs 485314, 485341, 504466, 624202, and 651820), Optuna-optimized weighted ensembles consistently outperformed the best single model in PPV@128 by 0.06-0.12

artificial intelligence, machine learning, single model, (18 more...)

2603.25755

Genre: Research Report (0.84)

Industry: Health & Medicine (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Taieb, Ron, Greenberg, Yoel, Sober, Barak

Probabilistic Multilabel Graphical Modelling of Motif Transformations in Symbolic Music

Motifs often recur in musical works in altered forms, preserving aspects of their identity while undergoing local variation. This paper investigates how such motivic transformations occur within their musical context in symbolic music. To support this analysis, we develop a probabilistic framework for modeling motivic transformations and apply it to Beethoven's piano sonatas by integrating multiple datasets that provide melodic, rhythmic, harmonic, and motivic information within a unified analytical representation. Motif transformations are represented as multilabel variables by comparing each motif instance to a designated reference occurrence within its local context, ensuring consistent labeling across transformation families. We introduce a multilabel Conditional Random Field to model how motif-level musical features influence the occurrence of transformations and how different transformation families tend to co-occur. Our goal is to provide an interpretable, distributional analysis of motivic transformation patterns, enabling the study of their structural relationships and stylistic variation. By linking computational modeling with music-theoretical interpretation, the proposed framework supports quantitative investigation of musical structure and complexity in symbolic corpora and may facilitate the analysis of broader compositional patterns and writing practices.

artificial intelligence, machine learning, transformation, (17 more...)

2603.26478

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Shi, Haoming, Chi, Eric C., Luo, Hengrui

Asymptotic Optimism for Tensor Regression Models with Applications to Neural Network Compression

We study rank selection for low-rank tensor regression under random covariates design. Under a Gaussian random-design model and some mild conditions, we derive population expressions for the expected training-testing discrepancy (optimism) for both CP and Tucker decomposition. We further demonstrate that the optimism is minimized at the true tensor rank for both CP and Tucker regression. This yields a prediction-oriented rank-selection rule that aligns with cross-validation and extends naturally to tensor-model averaging. We also discuss conditions under which under- or over-ranked models may appear preferable, thereby clarifying the scope of the method. Finally, we showcase its practical utility on a real-world image regression task and extend its application to tensor-based compression of neural network, highlighting its potential for model selection in deep learning.

artificial intelligence, machine learning, optr, (20 more...)

2603.26048

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

Kim, Juno, Nichani, Eshaan, Wu, Denny, Bietti, Alberto, Lee, Jason D.

Spectral optimizers such as Muon have recently shown strong empirical performance in large-scale language model training, but the source and extent of their advantage remain poorly understood. We study this question through the linear associative memory problem, a tractable model for factual recall in transformer-based models. In particular, we go beyond orthogonal embeddings and consider Gaussian inputs and outputs, which allows the number of stored associations to greatly exceed the embedding dimension. Our main result sharply characterizes the recovery rates of one step of Muon and SGD on the logistic regression loss under a power law frequency distribution. We show that the storage capacity of Muon significantly exceeds that of SGD, and moreover Muon saturates at a larger critical batch size. We further analyze the multi-step dynamics under a thresholded gradient approximation and show that Muon achieves a substantially faster initial recovery rate than SGD, while both methods eventually converge to the information-theoretic limit at comparable speeds. Experiments on synthetic tasks validate the predicted scaling laws. Our analysis provides a quantitative understanding of the signal amplification of Muon and lays the groundwork for establishing scaling laws across more practical language modeling tasks and optimizers.

logd, machine learning, natural language, (21 more...)

2603.26554

Country:

Europe > France (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)