AITopics | torch

In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact realization of training data; this drives the systematic ``generalization gap'', where the train error becomes an unreliable proxy for test error. Existing approaches either argue this gap is benign through complex analysis or sacrifice data to a validation set. In contrast, we introduce decoupled descent (DD), a novel theory-based training algorithm that satisfies a train-test identity -- enforcing the train error to asymptotically track the test error for stylized Gaussian mixture models. Within this specific regime, leveraging approximate message passing theory, DD iteratively cancels the biases due to data reuse, rigorously demonstrating the feasibility of zero-cost validation and $100\%$ data utilization. Moreover, DD is governed by a low-dimensional state evolution recursion, rendering the dynamics of the algorithm transparent and tractable. We validate DD on XOR classification, yielding superior performance compared to GD; additionally, we implement noisy MNIST and non-linear probing of CIFAR-10, demonstrating that even when our stylized assumptions are relaxed, DD narrows the generalization gap compared to GD.

artificial intelligence, assumption, machine learning, (18 more...)

arXiv.org Machine Learning

2604.27883

Country: North America > Canada (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Appendix for "Episodic Multi-Task Learning with Heterogeneous Neural Processes "

Neural Information Processing SystemsApr-30-2026, 05:37:19 GMT

In this section, we list frequently asked questions from researchers who help proofread this manuscript. These raised questions might also be relevant for others and help in better understanding the paper, so we include more detailed discussions here. This work considers the multi-input multi-output setting of multi-task learning under the episodic training mechanism. As shown in Table 1, we use "Heterogeneous tasks" to distinguish the different branches of multi-task learning: (1) single-input multi-output (SIMO) considers different tasks which have the same input and different supervision information. All tasks are related since they share the target space. This setting encourages deep models to deal with the insufficient data of each task by aggregating the training data from related tasks in the spirit of data augmentation. Meanwhile, "Episodic training" is used to describe the data-feeding strategy. Multi-task meta-learning also benefits from episodic training, but it follows the SIMO setting in every single episode and cannot sufficiently handle heterogeneous tasks.

artificial intelligence, learning, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

dbc8ce0fdfcd55172d73fb05dbae07fc-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 23:57:35 GMT

distillation, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reliable Estimation of KLDivergence using a Discriminator in Reproducing Kernel Hilbert Space Supplementary Material

Neural Information Processing SystemsApr-25-2026, 23:05:59 GMT

Organization: This supplementary material is presented in a format parallel to the main paper. The section numbers and titles are consistent with the main paper. But, here we also add one new section: Section 10 where we describe the societal impacts and possible negative impacts of the paper. Similarly, the Theorem numbers are consistent with the main paper, but we also have several additional theorems and lemmas which were not included in the main paper. GAN-type Objective for KLEstimation Let f be a discriminator, f: X IR. Let p(x) and q(x) be two probability density functions defined over the space X.

artificial intelligence, dim, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

2376f25ef1725a9e3516ee3c86a59f46-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 22:00:45 GMT

artificial intelligence, machine learning, subspace, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

25b040c97a75021e57100648a20b1e10-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 03:28:45 GMT

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Impact

Neural Information Processing SystemsApr-25-2026, 02:07:21 GMT

More precisely, we use batches of size 2. Each batch contains one patch with the foreground oversampled. Furthermore, we split each silo's data into training and validation data with 80% and 20% split, respectively. All this pre-processing and patching is done using the nnU-Net library [IJK+21]. Loss function We use the same loss function as proposed by nnU-Net [IJK+21] for the KiTS19 dataset which is based on DICE [Dic45] and on the Cross Entropy loss.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > Austria (0.28)
Europe > Netherlands (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
(3 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(2 more...)

Add feedback

008bd5ad93b754d500338c253d9c1770-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 09:34:12 GMT

artificial intelligence, discriminative filter, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

mlr3torch: A Deep Learning Framework in R based on mlr3 and torch

Fischer, Sebastian, Burk, Lukas, Zhang, Carson, Bischl, Bernd, Binder, Martin

arXiv.org Machine LearningApr-21-2026

Deep learning (DL) has become a cornerstone of modern machine learning (ML) praxis. We introduce the R package mlr3torch, which is an extensible DL framework for the mlr3 ecosystem. It is built upon the torch package, and simplifies the definition, training, and evaluation of neural networks for both tabular data and generic tensors (e.g., images) for classification and regression. The package implements predefined architectures, and torch models can easily be converted to mlr3 learners. It also allows users to define neural networks as graphs. This representation is based on the graph language defined in mlr3pipelines and allows users to define the entire modeling workflow, including preprocessing, data augmentation, and network architecture, in a single graph. Through its integration into the mlr3 ecosystem, the package allows for convenient resampling, benchmarking, preprocessing, and more. We explain the package's design and features and show how to customize and extend it to new problems. Furthermore, we demonstrate the package's capabilities using three use cases, namely hyperparameter tuning, fine-tuning, and defining architectures for multimodal data. Finally, we present some runtime benchmarks.

architecture, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2604.18152

Country: