Goto

Collaborating Authors

 Kiefer, Nicholas


Model Fusion via Neuron Transplantation

arXiv.org Artificial Intelligence

Ensemble learning is a widespread technique to improve the prediction performance of neural networks. However, it comes at the price of increased memory and inference time. In this work we propose a novel model fusion technique called \emph{Neuron Transplantation (NT)} in which we fuse an ensemble of models by transplanting important neurons from all ensemble members into the vacant space obtained by pruning insignificant neurons. An initial loss in performance post-transplantation can be quickly recovered via fine-tuning, consistently outperforming individual ensemble members of the same model capacity and architecture. Furthermore, NT enables all the ensemble members to be jointly pruned and jointly trained in a combined model. Comparing it to alignment-based averaging (like Optimal-Transport-fusion), it requires less fine-tuning than the corresponding OT-fused model, the fusion itself is faster and requires less memory, while the resulting model performance is comparable or better. The code is available under the following link: https://github.com/masterbaer/neuron-transplantation.


A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting

arXiv.org Artificial Intelligence

The current landscape in time-series forecasting is dominated by Transformer-based models. Their high parameter count and corresponding demand in computational resources pose a challenge to real-world deployment, especially for commercial and scientific applications with low-power embedded devices. Pruning is an established approach to reduce neural network parameter count and save compute. However, the implications and benefits of pruning Transformer-based models for time series forecasting are largely unknown. To close this gap, we provide a comparative benchmark study by evaluating unstructured and structured pruning on various state-of-the-art multivariate time series models. We study the effects of these pruning strategies on model predictive performance and computational aspects like model size, operations, and inference time. Our results show that certain models can be pruned even up to high sparsity levels, outperforming their dense counterpart. However, fine-tuning pruned models is necessary. Furthermore, we demonstrate that even with corresponding hardware and software support, structured pruning is unable to provide significant time savings.


AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning

arXiv.org Artificial Intelligence

Communication bottlenecks severely hinder the scalability of distributed neural network training, particularly in high-performance computing (HPC) environments. We introduce AB-training, a novel data-parallel method that leverages low-rank representations and independent training groups to significantly reduce communication overhead. Our experiments demonstrate an average reduction in network traffic of approximately 70.31\% across various scaling scenarios, increasing the training potential of communication-constrained systems and accelerating convergence at scale. AB-training also exhibits a pronounced regularization effect at smaller scales, leading to improved generalization while maintaining or even reducing training time. We achieve a remarkable 44.14 : 1 compression ratio on VGG16 trained on CIFAR-10 with minimal accuracy loss, and outperform traditional data parallel training by 1.55\% on ResNet-50 trained on ImageNet-2012. While AB-training is promising, our findings also reveal that large batch effects persist even in low-rank regimes, underscoring the need for further research into optimized update mechanisms for massively distributed training.


Harnessing Orthogonality to Train Low-Rank Neural Networks

arXiv.org Artificial Intelligence

This study explores the learning dynamics of neural networks by analyzing the singular value decomposition (SVD) of their weights throughout training. Our investigation reveals that an orthogonal basis within each multidimensional weight's SVD representation stabilizes during training. Building upon this, we introduce Orthogonality-Informed Adaptive Low-Rank (OIALR) training, a novel training method exploiting the intrinsic orthogonality of neural networks. OIALR seamlessly integrates into existing training workflows with minimal accuracy loss, as demonstrated by benchmarking on various datasets and well-established network architectures. With appropriate hyperparameter tuning, OIALR can surpass conventional training setups, including those of state-of-the-art models.


A dynamic risk score for early prediction of cardiogenic shock using machine learning

arXiv.org Artificial Intelligence

Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the US. The morbidity and mortality are highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock is critical. Prompt implementation of treatment measures can prevent the deleterious spiral of ischemia, low blood pressure, and reduced cardiac output due to cardiogenic shock. However, early identification of cardiogenic shock has been challenging due to human providers' inability to process the enormous amount of data in the cardiac intensive care unit (icu) and lack of an effective risk stratification tool. We developed a deep learning-based risk stratification tool, called CShock, for patients admitted into the cardiac icu with acute decompensated heart failure and/or myocardial infarction to predict onset of cardiogenic shock. To develop and validate CShock, we annotated cardiac icu datasets with physician adjudicated outcomes. CShock achieved an area under the receiver operator characteristic curve (auroc) of 0.820, which substantially outperformed CardShock (auroc 0.519), a well-established risk score for cardiogenic shock prognosis. CShock was externally validated in an independent patient cohort and achieved an auroc of 0.800, demonstrating its generalizability in other cardiac icus.