Fast yet Safe: Early-Exiting with Risk Control Alexander Timans 1, Tin Hadži Veljković

Neural Information Processing Systems

Scaling machine learning models significantly improves their performance. However, such gains come at the cost of inference being slow and resource-intensive. Early-exit neural networks (EENNs) offer a promising solution: they accelerate inference by allowing intermediate layers to'exit' and produce a prediction early. Yet a fundamental issue with EENNs is how to determine when to exit without severely degrading performance. In other words, when is it'safe' for an EENN to go'fast'? To address this issue, we investigate how to adapt frameworks of risk control to EENNs. Risk control offers a distribution-free, post-hoc solution that tunes the EENN's exiting mechanism so that exits only occur when the output is of sufficient quality. We empirically validate our insights on a range of vision and language tasks, demonstrating that risk control can produce substantial computational savings, all the while preserving user-specified performance goals.







Geometry-aware training of factorized layers in tensor Tucker format

Neural Information Processing Systems

Reducing parameter redundancies in neural network architectures is crucial for achieving feasible computational and memory requirements during training and inference phases. Given its easy implementation and flexibility, one promising approach is layer factorization, which reshapes weight tensors into a matrix format and parameterizes them as the product of two small rank matrices. However, this approach typically requires an initial full-model warm-up phase, prior knowledge of a feasible rank, and it is sensitive to parameter initialization. In this work, we introduce a novel approach to train the factors of a Tucker decomposition of the weight tensors. Our training proposal proves to be optimal in locally approximating the original unfactorized dynamics independently of the initialization. Furthermore, the rank of each mode is dynamically updated during training. We provide a theoretical analysis of the algorithm, showing convergence, approximation and local descent guarantees. The method's performance is further illustrated through a variety of experiments, showing remarkable training compression rates and comparable or even better performance than the full baseline and alternative layer factorization strategies.


TransBoost: Improving the Best ImageNet Performance using Deep Transduction Supplementary Material

Neural Information Processing Systems

Department of Computer Science Department of Computer Science Technion - Israel Institute of Technology Technion - Israel Institute of Technology omer.be@cs.technion.ac.il guy.b@cs.technion.ac.il In general TransBoost is particularly useful when we are able to accumulate a test set of instances and then finetune a specialized model to predict their labels. This setting has numerous use cases in various application fields including: Medicine Medical diagnosis is one possible meaningful use case. In this case, medical records can be gathered on a daily or weekly basis. TransBoost can then be used to finetune transductive models on top of existing inductive models in order to provide more reliable results for these specific records.



A Appendix

Neural Information Processing Systems

A.1 Algorithms In this section, we provide the pseudo code of the potential-dependent dropping scheme (Algorithm 1) and the overall training procedures (Algorithm 2) of SNNs with our proposed methods. A.2 Details of Datasets and Training Settings A.2.1 MNIST The MNIST dataset contains 60000 images for training and 10000 for testing. Each sample in MNIST is a gray-scale handwritten digit in size of 28 28 pixels. A.2.2 CIFAR10 The CIFAR10 is a collection of 60000 color images, divided into 50000 images for training and 10000 images for testing. All images are equally distributed and labelled as 10 classes.