AITopics

2606.29658

Genre: Research Report (0.64)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Qiang, Yuan Christopher, Sigrist, Fabio

A Censored Transformed Model for Proportional Outcomes with Boundary Mass and an Application to Loss Given Default Modeling

arXiv.org Machine LearningJun-23-2026

We introduce the zero-one censored transformed normal (ZOC-TN) model for proportional responses with potential probability mass at the boundaries 0 and 1. The model combines a censored Gaussian variable with a two-parameter affine-logit transformation on the interior (0,1). We characterize the transformation parameters, establish large-sample properties, and relate the affine-logit specification to broader classes of interior distributions. Theoretical and experimental results demonstrate that the proposed model can capture a wider range of qualitative density shapes than several benchmark models while remaining parsimonious, computationally efficient, and numerically stable. Furthermore, the ZOC-TN model can be extended (i) to account for nonlinearities and interactions in a tree-boosting machine learning framework and (ii) to explicitly model residual spatio-temporal variability. We apply the ZOC-TN model to loss given default (LGD) modeling for a large dataset of U.S. residential mortgages and compare it to multiple benchmark models. We find that a tree-boosted ZOC-TN model with a spatio-temporal frailty Gaussian process delivers the strongest out-of-sample performance, indicating that mortgage losses are shaped by nonlinear covariate effects and by unaccounted-for space-time variation.

artificial intelligence, machine learning, zoc-tn model, (17 more...)

2606.21515

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Real Estate (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsJun-22-2026, 23:21:57 GMT

Regional Explanations: Bridging Local and Global Variable Importance

We analyze two widely used local attribution methods, Local Shapley Values and LIME, which aim to quantify the contribution of a feature value xi to a specific prediction f(x1,...,xp). Despite their widespread use, we identify fundamental limitations in their ability to reliably detect locally important features, even under ideal conditions with exact computations and independent features. We argue that a sound local attribution method should not assign importance to features that neither influence the model output (e.g., features with zero coefficients in a linear model) nor exhibit statistical dependence with functionality-relevant features. We demonstrate that both Local SV and LIME violate this fundamental principle. To address this, we propose R-LOCO (Regional Leave Out COvariates), which bridges the gap between local and global explanations and provides more accurate attributions.

artificial intelligence, data mining, machine learning, (19 more...)

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Neural Information Processing SystemsJun-22-2026, 20:45:56 GMT

Understanding and Enhancing Mask-Based Pretraining towards Universal Representations

Mask-based pretraining has become a cornerstone of modern large-scale models across language, vision, and recently biology. Despite its empirical success, its role and limits in learning data representations have been unclear. In this work, we show that the behavior of mask-based pretraining can be directly characterized by test risk in high-dimensional minimum-norm ("ridge-less") linear regression, without relying on further model specifications.

large language model, machine learning, natural language, (19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Neural Information Processing SystemsJun-22-2026, 16:53:23 GMT

c440061f56f03d6aaad6f71bebf7491e-Paper-Conference.pdf

Estimating associations between spatial covariates and responses -- rather than merely predicting responses -- is central to environmental science, epidemiology, and economics. For instance, public health officials might be interested in whether air pollution has a strictly positive association with a health outcome, and the magnitude of any effect. Standard machine learning methods often provide accurate predictions but offer limited insight into covariate-response relationships. And we show that existing methods for constructing confidence (or credible) intervals for associations can fail to provide nominal coverage in the face of model misspecification and nonrandom locations -- despite both being essentially always present in spatial problems. We introduce a method that constructs valid frequentist confidence intervals for associations in spatial settings. Our method requires minimal assumptions beyond a form of spatial smoothness and a homoskedastic Gaussian error assumption. In particular, we do not require model correctness or covariate overlap between training and target locations. Our approach is the first to guarantee nominal coverage in this setting and outperforms existing techniques in both real and simulated experiments. Our confidence intervals are valid in finite samples when the noise of the Gaussian error is known, and we provide an asymptotically consistent estimation procedure for this noise variance when it is unknown.

artificial intelligence, machine learning, modeling & simulation, (20 more...)

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Consumer Health (0.67)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Modeling & Simulation (0.93)

Neural Information Processing SystemsJun-18-2026, 21:08:15 GMT

True Impact of Cascade Length in Contextual Cascading Bandits

We revisit the contextual cascading bandit, where a learning agent recommends an ordered list (cascade) of items, and a user scans the list sequentially, stopping at the first attractive item. Although cascading bandits underpin various applications including recommender systems and search engines, the role of the cascade length K in shaping regret has remained unclear. Contrary to prior results that regret grows with K, we prove that regret actually decreases once K is large enough. Leveraging this insight, we design a new upper-confidence-bound algorithm built on online mirror descent that attains the sharpest known regret upper bound, O min{K pK 1,1}d Tfor contextual cascading bandits. To complement this new regret upper bound, we provide a nearly matching lower bound of Ω min{KpK 1,1}d T, where 0 p p < 1. Together, these results fully characterize how regret truly scales with K, thereby closing the theoretical gap for contextual cascading bandits. Finally, comprehensive experiments validate our theoretical results and show the effectiveness of our proposed method.

data mining, machine learning, natural language, (20 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsJun-14-2026, 07:12:31 GMT

It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of Robinson (1988), we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of Mackey et al. (2018). Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These ACE procedures use structure-agnostic cumulant estimators to achieve r-th order insensitivity to nuisance errors whenever the (r+1)-st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.

artificial intelligence, machine learning, proceedings, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Boussena, Mohamed, Monville, Florence, Fieschi-Meric, Jacques, Vely, Frederic, Milpied, Pierre, Mazieres, Julien, Perol, Maurice, Vivier, Eric, Greillier, Laurent, Barlesi, Fabrice, Benzekry, Sebastien

Multimodality Stacking with Blockwise missing values and application to the PIONeeR biomarkers study for prediction of resistance to immunotherapy

arXiv.org Machine LearningMay-26-2026

Integrating multimodal datasets in clinical oncology is frequently hindered by high dimensionality and blockwise missingness, where entire data sources are unavailable for specific patient subsets. Standard survival models often struggle with these gaps, leading to biased results or patient exclusion. We introduce Multimodality Stacking with Blockwise missing values (MSB), a late-fusion framework for survival analysis that independently models modality-specific features before aggregating predictions via a cross-validated stacking meta-learner. MSB was validated on the PIONeeR study (n=443 patients, 378 biomarkers across eight heterogeneous sources) to predict progression-free survival in advanced non-small cell lung cancer patients receiving immunotherapy. MSB yielded higher predictive performance (C-index) than baseline algorithms. Improvements varied by baseline strength: linear models showed a 15.9% increase (p<0.001 for the Wilcoxon signed-rank test), random survival forests gained 5.4% (p=0.002), and gradient boosting methods improved by 2.1% (p=0.030). Beyond discrimination, MSB reduced the generalization gap (train-test difference in 5 folds cross-validation repeated 3 times: 0.055 vs 0.380 for linear models). Permutation importance analysis identified routine laboratory markers, clinical features, and PD-L1 expression as primary predictive drivers. Missing block indicators showed negligible importance, suggesting the model learned from biomarker values rather than data availability patterns. MSB provides a statistically validated framework for multimodal survival prediction with blockwise missingness. By enabling systematic biomarker evaluation without requiring complete data, MSB offers a practical tool for predictive modeling in biomedical research, pending external validation. Implementation is available at https://github.com/MohamedBoussena/MSB under Inria license.

artificial intelligence, machine learning, train 0, (17 more...)

2605.2505

Country: Europe > France (0.69)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

arXiv.org Machine LearningMay-19-2026

Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness and Safety

Kim, Seok-Jin

We study the multi-task linear regression problem in the presence of contaminated tasks. We address the setting where the unknown parameters of a majority of tasks are close in the $\ell_2$-norm, while a fraction of tasks are arbitrary outliers. Existing theoretical frameworks for this problem rely heavily on the assumption that the empirical second moment of each task has a minimum eigenvalue bounded away from zero (order $Ω(1)$). Crucially, this assumption fails in many high-dimensional scenarios, rendering prior guarantees vacuous. To overcome this limitation, we propose an estimator based on matrix-weighted norm regularization. We also introduce a relative balancedness condition, quantified by a balancedness constant, that compares each task's second moment with the average inlier geometry and relaxes the need for taskwise second-moment lower bounds. In favorable regimes with moderate balancedness, our prediction MSE bounds match the rate of Duan and Wang (2023) under substantially weaker spectral assumptions; the resulting task-overall MSE is minimax optimal up to logarithmic factors. Furthermore, we demonstrate that our estimator enjoys a safety guarantee: when the relevant balancedness constant is large or infinite, or when tasks are unrelated, the method performs no worse than independent task learning. Consequently, our methodology achieves simultaneous adaptivity to task similarity, robustness to outliers, and safety outside favorable transfer regimes.

artificial intelligence, machine learning, probability, (16 more...)

2605.17126

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.84)

arXiv.org Machine LearningApr-28-2026

Nearly Optimal Subdata Selection

Yang, Min, Zheng, Wei, Stufken, John, Chang, Ming-Chung, Tian, Ting, Wang, Xueqin

When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further consideration. A central question for selecting subdata of size $n$ from $N$ available data points is which $n$ points to select. While an answer to this question depends on the objective, one approach for a parametric model and a focus on parameter estimation is to select subdata that retains maximal information. Identifying such subdata is a classical NP-hard problem due to its inherent discreteness. Based on optimal approximate design theory, we develop a new methodology for information-based subdata selection, resulting in subdata that approaches the optimal solution. To achieve this, we develop a novel algorithm that applies to a general model, accommodates arbitrary choices of $N$ and $n$, and supports multiple optimality criteria, and we prove its convergence. Moreover, the new methodology facilitates an assessment of the efficiency of subdata selected by any method by obtaining tight lower and upper bounds for the efficiency. We show that the subdata obtained through the new methodology is highly efficient and outperforms all existing methods.

artificial intelligence, machine learning, subdata, (16 more...)

2604.2393

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)