Goto

Collaborating Authors

 torch


Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing

arXiv.org Machine Learning

In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact realization of training data; this drives the systematic ``generalization gap'', where the train error becomes an unreliable proxy for test error. Existing approaches either argue this gap is benign through complex analysis or sacrifice data to a validation set. In contrast, we introduce decoupled descent (DD), a novel theory-based training algorithm that satisfies a train-test identity -- enforcing the train error to asymptotically track the test error for stylized Gaussian mixture models. Within this specific regime, leveraging approximate message passing theory, DD iteratively cancels the biases due to data reuse, rigorously demonstrating the feasibility of zero-cost validation and $100\%$ data utilization. Moreover, DD is governed by a low-dimensional state evolution recursion, rendering the dynamics of the algorithm transparent and tractable. We validate DD on XOR classification, yielding superior performance compared to GD; additionally, we implement noisy MNIST and non-linear probing of CIFAR-10, demonstrating that even when our stylized assumptions are relaxed, DD narrows the generalization gap compared to GD.


Appendix for "Episodic Multi-Task Learning with Heterogeneous Neural Processes "

Neural Information Processing Systems

In this section, we list frequently asked questions from researchers who help proofread this manuscript. These raised questions might also be relevant for others and help in better understanding the paper, so we include more detailed discussions here. This work considers the multi-input multi-output setting of multi-task learning under the episodic training mechanism. As shown in Table 1, we use "Heterogeneous tasks" to distinguish the different branches of multi-task learning: (1) single-input multi-output (SIMO) considers different tasks which have the same input and different supervision information. All tasks are related since they share the target space. This setting encourages deep models to deal with the insufficient data of each task by aggregating the training data from related tasks in the spirit of data augmentation. Meanwhile, "Episodic training" is used to describe the data-feeding strategy. Multi-task meta-learning also benefits from episodic training, but it follows the SIMO setting in every single episode and cannot sufficiently handle heterogeneous tasks.



Reliable Estimation of KLDivergence using a Discriminator in Reproducing Kernel Hilbert Space Supplementary Material

Neural Information Processing Systems

Organization: This supplementary material is presented in a format parallel to the main paper. The section numbers and titles are consistent with the main paper. But, here we also add one new section: Section 10 where we describe the societal impacts and possible negative impacts of the paper. Similarly, the Theorem numbers are consistent with the main paper, but we also have several additional theorems and lemmas which were not included in the main paper. GAN-type Objective for KLEstimation Let f be a discriminator, f: X IR. Let p(x) and q(x) be two probability density functions defined over the space X.




Impact

Neural Information Processing Systems

More precisely, we use batches of size 2. Each batch contains one patch with the foreground oversampled. Furthermore, we split each silo's data into training and validation data with 80% and 20% split, respectively. All this pre-processing and patching is done using the nnU-Net library [IJK+21]. Loss function We use the same loss function as proposed by nnU-Net [IJK+21] for the KiTS19 dataset which is based on DICE [Dic45] and on the Cross Entropy loss.



mlr3torch: A Deep Learning Framework in R based on mlr3 and torch

arXiv.org Machine Learning

Deep learning (DL) has become a cornerstone of modern machine learning (ML) praxis. We introduce the R package mlr3torch, which is an extensible DL framework for the mlr3 ecosystem. It is built upon the torch package, and simplifies the definition, training, and evaluation of neural networks for both tabular data and generic tensors (e.g., images) for classification and regression. The package implements predefined architectures, and torch models can easily be converted to mlr3 learners. It also allows users to define neural networks as graphs. This representation is based on the graph language defined in mlr3pipelines and allows users to define the entire modeling workflow, including preprocessing, data augmentation, and network architecture, in a single graph. Through its integration into the mlr3 ecosystem, the package allows for convenient resampling, benchmarking, preprocessing, and more. We explain the package's design and features and show how to customize and extend it to new problems. Furthermore, we demonstrate the package's capabilities using three use cases, namely hyperparameter tuning, fine-tuning, and defining architectures for multimodal data. Finally, we present some runtime benchmarks.