AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Time-o1: Time-Series Forecasting Needs Transformed Label Alignment

Neural Information Processing SystemsJun-14-2026, 16:11:50 GMT

Training time-series forecasting models poses unique challenges in loss function design. Most existing approaches adopt temporal mean squared error, but this study reveals two critical limitations: it ignores the presence of label autocorrelation, which biases it from the true label sequence likelihood; it involves excessive number of tasks, which complicates optimization, especially for long-term forecasting. To address these issues, we introduce Time-o1, a transform-enhanced loss function for time-series forecasting. The central idea is to transform the label sequence into decorrelated components with discriminated significance. Models are then trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount. Experiments demonstrate that Time-o1 achieves state-of-the-art performance and is compatible with various forecast models.

data mining, forecasting, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

Neural Information Processing SystemsJun-14-2026, 16:09:25 GMT

Current state-of-the-art generative models map noise to data distributions by matching flows or scores. A key limitation of these models is their inability to readily integrate available partial observations and additional priors. In contrast, energybased models (EBMs) address this by incorporating corresponding scalar energy terms. Here, we propose Energy Matching, a framework that endows flow-based approaches with the flexibility of EBMs. Far from the data manifold, samples move from noise to data along irrotational, optimal transport paths.

data manifold, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > Canada > Ontario (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Flow based approach for Dynamic Temporal Causal models with non-Gaussian or Heteroscedastic Noises

Neural Information Processing SystemsJun-14-2026, 15:40:28 GMT

Understanding causal relationships in multivariate time series is crucial in many scenarios, such as those dealing with financial or neurological data.

machine learning, natural language, regime, (18 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Banking & Finance (0.92)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(4 more...)

Add feedback

Fast Rate Bounds for Multi-Task and Meta-Learning with Different Sample Sizes

Neural Information Processing SystemsJun-14-2026, 15:23:44 GMT

We present new fast-rate PAC-Bayesian generalization bounds for multi-task and meta-learning in the unbalanced setting, i.e. when the tasks have training sets of different sizes, as is typically the case in real-world scenarios. Previously, only standard-rate bounds were known for this situation, while fast-rate bounds were limited to the setting where all training sets are of equal size. Our new bounds are numerically computable as well as interpretable, and we demonstrate their flexibility in handling a number of cases where they give stronger guarantees than previous bounds. Besides the bounds themselves, we also make conceptual contributions: we demonstrate that the unbalanced multi-task setting has different statistical properties than the balanced situation, specifically that proofs from the balanced situation do not carry over to the unbalanced setting. Additionally, we shed light on the fact that the unbalanced situation allows two meaningful definitions of multi-task risk, depending on whether all tasks should be considered equally important or if sample-rich tasks should receive more weight than samplepoor ones.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Incentivizing Time-Aware Fairness in Data Sharing

Neural Information Processing SystemsJun-14-2026, 15:06:20 GMT

In collaborative data sharing and machine learning, multiple parties aggregate their data resources to train a machine learning model with better model performance. However, as the parties incur data collection costs, they are only willing to do so when guaranteed incentives, such as fairness and individual rationality. Existing frameworks assume that all parties join the collaboration simultaneously, which does not hold in many real-world scenarios. Due to the long processing time for data cleaning, difficulty in overcoming legal barriers, or unawareness, the parties may join the collaboration at different times. In this work, we propose the following perspective: As a party who joins earlier incurs higher risk and encourages the contribution from other wait-and-see parties, that party should receive a reward of higher value for sharing data earlier. To this end, we propose a fair and time-aware data sharing framework, including novel time-aware incentives. We develop new methods for deciding reward values to satisfy these incentives. We further illustrate how to generate model rewards that realize the reward values and empirically demonstrate the properties of our methods on synthetic and real-world datasets.

artificial intelligence, data quality, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Asia > Singapore (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

0b77d3a82b59e9d9899370b378087faf-Paper-Conference.pdf

Neural Information Processing SystemsJun-14-2026, 14:46:36 GMT

Curriculum learning has emerged as an effective strategy to enhance the training efficiency and generalization of machine learning models. However, its theoretical underpinnings remain relatively underexplored. In this work, we develop a theoretical framework for curriculum learning based on biased regularized empirical risk minimization (RERM), identifying conditions under which curriculum learning provably improves generalization. We introduce a sufficient condition that characterizes a "good" curriculum and analyze a multi-task curriculum framework, where solving a sequence of convex tasks can facilitate better generalization. We also demonstrate how these theoretical insights translate to practical benefits when using stochastic gradient descent (SGD) as an optimization method. Beyond convex settings, we explore the utility of curriculum learning for non-convex tasks. Empirical evaluations on synthetic datasets and MNIST validate our theoretical findings and highlight the practical efficacy of curriculum-based training.

artificial intelligence, bwt, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Split Conformal Classification with Unsupervised Calibration

Neural Information Processing SystemsJun-14-2026, 14:38:11 GMT

Methods for split conformal prediction leverage calibration samples to transform any prediction rule into a set-prediction rule that complies with a target coverage probability. Existing methods provide remarkably strong performance guarantees with minimal computational costs. However, they require the use calibration samples composed by labeled examples different to those used for training. This requirement can be highly inconvenient, as it prevents the use of all labeled examples for training and may require acquiring additional labels solely for calibration. This paper presents an effective methodology for split conformal prediction with unsupervised calibration for classification tasks.

artificial intelligence, calibration sample, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation

Neural Information Processing SystemsJun-14-2026, 14:13:19 GMT

Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or compounding errors from learned dynamics models. To address these challenges, we propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE in high-dimensional state and action spaces. Starting with a diffusion model pre-trained on the behavior data, STITCHOPE generates synthetic trajectories from the target policy by guiding the denoising process using the score function of the target policy. STITCH-OPE proposes two technical innovations that make it advantageous for OPE: (1) prevents overregularization by subtracting the score of the behavior policy during guidance, and (2) generates long-horizon trajectories by stitching partial trajectories together end-to-end. We provide a theoretical guarantee that under mild assumptions, these modifications result in an exponential reduction in variance versus long-horizon trajectory diffusion.

artificial intelligence, machine learning, target policy, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
(2 more...)

Add feedback

Tighter CMI-Based Generalization Bounds via Stochastic Projection and Quantization

Neural Information Processing SystemsJun-14-2026, 14:11:00 GMT

In this paper, we leverage stochastic projection and lossy compression to establish new conditional mutual information (CMI) bounds on the generalization error of statistical learning algorithms. It is shown that these bounds are generally tighter than the existing ones. In particular, we prove that for certain problem instances for which existing MI and CMI bounds were recently shown in Attias et al. [2024] and Livni [2023] to become vacuous or fail to describe the right generalization behavior, our bounds yield suitable generalization guarantees of the order of O(1/ n), where nis the size of the training dataset. Furthermore, we use our bounds to investigate the problem of data "memorization" raised in those works, and which asserts that there are learning problem instances for which any learning algorithm that has good prediction there exist distributions under which the algorithm must "memorize" a big fraction of the training dataset. We show that for every learning algorithm, there exists an auxiliary algorithm that does not memorize and which yields comparable generalization error for any data distribution. In part, this shows that memorization is not necessary for good generalization.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.46)
Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Doubly Robust Alignment for Large Language Models

Neural Information Processing SystemsJun-14-2026, 14:01:27 GMT

While RLHF has demonstrated promising results, many algorithms are highly sensitive to misspecifications in the underlying preference model (e.g., the Bradley-Terry model), the reference policy, or the reward function, resulting in undesirable fine-tuning. To address model misspecification, we propose a doubly robust preference optimization algorithm that remains consistent when either the preference model or the reference policy is correctly specified (without requiring both). Our proposal demonstrates superior and more robust performance than state-of-the-art algorithms, both in theory and in practice.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: