AITopics

2605.06609

Genre:

Research Report > New Finding (0.64)
Research Report > Experimental Study (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

arXiv.org Machine LearningApr-16-2026

Multistage Conditional Compositional Optimization

Şen, Buse, Hu, Yifan, Kuhn, Daniel

We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, as well as in problems involving dynamic risk measures. The naïve nested sampling approach for MCCO suffers from the curse of dimensionality familiar from scenario tree-based multistage stochastic programming, that is, its scenario complexity grows exponentially with the number of nests. We develop new multilevel Monte Carlo techniques for MCCO whose scenario complexity grows only polynomially with the desired accuracy.

artificial intelligence, estimator, machine learning, (18 more...)

2604.14075

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Switzerland (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Le, Naht Sinh, Denis, Christophe, Hebiri, Mohamed

Demographic Parity Tails for Regression

arXiv.org Machine LearningApr-3-2026

Demographic parity (DP) is a widely studied fairness criterion in regression, enforcing independence between the predictions and sensitive attributes. However, constraining the entire distribution can degrade predictive accuracy and may be unnecessary for many applications, where fairness concerns are localized to specific regions of the distribution. To overcome this issue, we propose a new framework for regression under DP that focuses on the tails of target distribution across sensitive groups. Our methodology builds on optimal transport theory. By enforcing fairness constraints only over targeted regions of the distribution, our approach enables more nuanced and context-sensitive interventions. Leveraging recent advances, we develop an interpretable and flexible algorithm that leverages the geometric structure of optimal transport. We provide theoretical guarantees, including risk bounds and fairness properties, and validate the method through experiments in regression settings.

artificial intelligence, fairness, machine learning, (19 more...)

2604.02017

Country:

North America > United States > California (0.04)
Europe > France (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Schölpple, Max, Fanghui, Liu, Steinwart, Ingo

Self-Regularized Learning Methods

arXiv.org Machine LearningMar-19-2026

We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations that many algorithms, such as gradient-descent based learning, exhibit implicit regularization. In a nutshell, for a self-regularized algorithm the complexity of the predictor is inherently controlled by that of the simplest comparator achieving the same empirical risk. This framework is sufficiently rich to cover both classical regularized empirical risk minimization and gradient descent. Building on self-regularization, we provide a thorough statistical analysis of such algorithms including minmax-optimal rates, where it suffices to show that the algorithm is self-regularized -- all further requirements stem from the learning problem itself. Finally, we discuss the problem of data-dependent hyperparameter selection, providing a general result which yields minmax-optimal rates up to a double logarithmic factor and covers data-driven early stopping for RKHS-based gradient descent.

artificial intelligence, assumption, machine learning, (17 more...)

2603.1716

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Neural Information Processing SystemsFeb-19-2026, 19:32:37 GMT

Bayesian Joint Estimation of Multiple Graphical Models

Lingrui Gan, Xinming Yang, Naveen Narisetty, Feng Liang

In many applications, observations are naturally grouped into different classes.

artificial intelligence, machine learning, precision matrix, (16 more...)

Country:

North America > United States > Illinois (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Neural Information Processing SystemsFeb-19-2026, 12:00:15 GMT

3d779cae2d46cf6a8a99a35ba4167977-AuthorFeedback.pdf

Our approach is purely based on 2D convolutions. Nevertheless, it3 outperforms or performs comparably to many more costly 3D models. We thank the reviewers for pointing out some related (or missing) references. The12 Timeception layers involve group convolutions at different time scales while our TAM layers only use depthwise13 convolution. As a result, the Timeception has significantly more parameters than the TAM (10% vs. 0.1% of the14 totalmodelparameters).

architecture, artificial intelligence, convolution, (5 more...)

Technology: Information Technology > Artificial Intelligence (0.38)

Neural Information Processing SystemsFeb-19-2026, 09:45:27 GMT

6 SupplementaryMaterial

The original CLUTRR data generation framework made sure that each testproof is not in the training set in order to test whether a model is able to generalize to unseen proofs. Initial results on the original CLUTRR test sets resulted in strong model performance ( 99%) on levels seen during training (2, 4, 6) but no generalization at all ( 0%) to other levels. The models are given as input " [story] [query] " and asked to generate the proof and answer. Models are trained on levels2,4,6only. In our case, the entity names are important to evaluate systematic generalization.

artificial intelligence, lvl, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.74)

Neural Information Processing SystemsFeb-19-2026, 03:55:32 GMT

6f5216f8d89b086c18298e043bfe48ed-Paper.pdf

Withoutrequiring repeatable trials, itcanflexibly capture covariate-dependent jointSCDs, andprovide interpretable latent causes underlying the statistical dependencies between neurons.

artificial intelligence, machine learning, variability, (17 more...)