AITopics

2605.25616

Country: Asia > South Korea (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Law > Litigation (0.96)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningMay-8-2026

Covariate Balancing and Riesz Regression Should Be Guided by the Neyman Orthogonal Score in Debiased Machine Learning

Kato, Masahiro

This position paper argues that, in debiased machine learning, balancing functions should be derived from the Neyman orthogonal score, not chosen only as functions of covariates. Covariate balancing is effective when the regression error entering the score can be represented by functions of covariates alone, and it is the natural finite-dimensional approximation for targets such as ATT counterfactual means. For ATE estimation under treatment effect heterogeneity, however, the score error generally contains treatment-specific components because the outcome regression is a function of the full regressor $X=(D,Z)$. In that case, balancing common functions of $Z$ can leave the treatment-specific component unbalanced. We therefore advocate regressor balancing, implemented by Riesz regression with basis functions of $X$, as the general balancing principle for DML. The position is not that covariate balancing is invalid, but that covariate balancing should be understood as the special case that is appropriate when the score-relevant regression error is a function of covariates alone.

artificial intelligence, covariate, machine learning, (17 more...)

2605.06386

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningMay-1-2026

Optimized Deferral for Imbalanced Settings

Cortes, Corinna, Mao, Anqi, Mohri, Mehryar, Zhong, Yutao

Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational cost. This approach, known as learning to defer, is essential in domains like natural language generation, medical diagnosis, and computer vision, where an effective deferral can reduce errors at low extra resource consumption. However, the two-stage learning to defer setting, which leverages existing predictors such as a collection of LLMs or other classifiers, often faces challenges due to an expert imbalance problem. This imbalance can lead to suboptimal performance, with deferral algorithms favoring the majority expert. We present a comprehensive study of two-stage learning to defer in expert imbalance settings. We cast the deferral loss optimization as a novel cost-sensitive learning problem over the input-expert domain. We derive new margin-based loss functions and guarantees tailored to this setting, and develop novel algorithms for cost-sensitive learning. Leveraging these results, we design principled deferral algorithms, MILD (Margin-based Imbalanced Learning to Defer), specifically suited for expert imbalance settings. Extensive experiments demonstrate the effectiveness of our approach, showing clear improvements over existing baselines on both image classification and real-world Large Language Model (LLM) routing tasks.

large language model, machine learning, natural language, (16 more...)

2604.27723

Country: North America (0.46)

Genre: Research Report (0.40)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsApr-26-2026, 13:17:00 GMT

ADerivation of D1 Denote the logit vector as x, we have pj = exj

Without zero-mean constraint, the training becomes unstable. Following the training setting of [23], the classifier network is trained with SGD with a weight decay 5e-4, an initial learning rate of 1e-1 and a mini-batch size of 100 for all methods. We use the cosine learning rate decay schedule [49] for a total of 80 epochs. We set the outer level learning ηω as 14 Figure 7: Training curve without zero-mean constraint on CIFAR10 under 40% uniform noise. The MLP weighting network is trained with Adam [51] with a fixed learning rate 1e-3 and a weight decay 1e-4.

artificial intelligence, experiment, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsApr-25-2026, 19:42:16 GMT

Realistic Evaluation of Transductive Few-Shot Learning - Supplementary Material

In the main tables of the paper, we did not include the performances of α-TIM in the standard balanced setting. Here, we emphasize that α-TIM is a generalization of TIM [1] as when α 1 (i.e., the α-entropies tend to the Shannon entropies), α-TIM tends to TIM. Therefore, in the standard setting, where optimal hyper-parameter αis obtained over validation tasks that are balanced (as in the standard validation tasks of the original TIM and the other existing methods), the performance of α-TIM is the same as TIM. When αis tuned on balanced validation tasks, we obtain an optimal value of αvery close to 1, and our α-mutual information approaches the standard mutual information. When the validation tasks are uniformly random, as in our new setting and in the validation plots we provided in the main figure, one can see that the performance of α-TIM remains competitive when we tend to balanced testing tasks (i.e., when a is increasing), but is significantly better than TIM when we tend to uniformly-random testing tasks (a = 1).

artificial intelligence, imbalance, machine learning, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningMar-26-2026

The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Wiredu-Aidoo, Randolph

In clustering, strong dominance in the size of a particular cluster is often undesirable, motivating a measure of cluster size uniformity that can be used to filter such partitions. A basic requirement of such a measure is stability: partitions that differ only slightly in their point assignments should receive similar uniformity scores. A difficulty arises because cluster labels are not fixed objects; algorithms may produce different numbers of labels even when the underlying point distribution changes very little. Measures defined directly over labels can therefore become unstable under label-count perturbations. I introduce the Mass Agreement Score (MAS), a point-centric metric bounded in [0, 1] that evaluates the consistency of expected cluster size as measured from the perspective of points in each cluster. Its construction yields fragment robustness by design, assigning similar scores to partitions with similar bulk structure while remaining sensitive to genuine redistribution of cluster mass.

artificial intelligence, machine learning, partition, (17 more...)

2603.23581

Country:

North America > United States > New York (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsMar-22-2026, 08:26:09 GMT

Visual Data Diagnosis and Debiasing with Concept Graphs

The widespread success of deep learning models today is owed to the curation of extensive datasets significant in size and complexity. However, such models frequently pick up inherent biases in the data during the training process, leading to unreliable predictions. Diagnosing and debiasing datasets is thus a necessity to ensure reliable model performance.

artificial intelligence, machine learning, proceedings, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Neural Information Processing SystemsMar-18-2026, 01:12:04 GMT

Rethinking Imbalance in Image Super-Resolution for Efficient Inference

Existing super-resolution (SR) methods optimize all model weights equally using $\mathcal{L}_1$ or $\mathcal{L}_2$ losses by uniformly sampling image patches without considering dataset imbalances or parameter redundancy, which limits their performance. To address this, we formulate the image SR task as an imbalanced distribution transfer learning problem from a statistical probability perspective, proposing a plug-and-play Weight-Balancing framework (WBSR) to achieve balanced model learning without changing the original model structure and training data. Specifically, we develop a Hierarchical Equalization Sampling (HES) strategy to address data distribution imbalances, enabling better feature representation from texture-rich samples. To tackle model optimization imbalances, we propose a Balanced Diversity Loss (BDLoss) function, focusing on learning texture regions while disregarding redundant computations in smooth regions. After joint training of HES and BDLoss to rectify these imbalances, we present a gradient projection dynamic inference strategy to facilitate accurate and efficient inference. Extensive experiments across various models, datasets, and scale factors demonstrate that our method achieves comparable or superior performance to existing approaches with about 34\% reduction in computational cost.

artificial intelligence, machine learning, proceedings, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsMar-16-2026, 16:56:57 GMT

6f1346bac8b02f76a631400e2799b24b-Paper-Conference.pdf

computer vision, proceedings, segmentation, (13 more...)

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
Asia > China > Guangxi Province > Nanning (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsFeb-18-2026, 18:02:08 GMT

SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification

Our findings show interesting trends, particularly pertaining to recent methods for data curation such as synthetic data generation and lookup based on CLIP embeddings. We show that although these strategies are highly competitive for certain tasks, the curation strategy used to assemble the original ImageNet-1K dataset remains the gold standard. We anticipate that our benchmark can illuminate the path for new methods to further reduce the gap.

data quality, machine learning, natural language, (21 more...)

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)