Goto

Collaborating Authors

 dun





Adversarial generalization of unfolding (model-based) networks

Kouni, Vicky

arXiv.org Artificial Intelligence

Unfolding networks are interpretable networks emerging from iterative algorithms, incorporate prior knowledge of data structure, and are designed to solve inverse problems like compressed sensing, which deals with recovering data from noisy, missing observations. Compressed sensing finds applications in critical domains, from medical imaging to cryptography, where adversarial robustness is crucial to prevent catastrophic failures. However, a solid theoretical understanding of the performance of unfolding networks in the presence of adversarial attacks is still in its infancy. In this paper, we study the adversarial generalization of unfolding networks when perturbed with $l_2$-norm constrained attacks, generated by the fast gradient sign method. Particularly, we choose a family of state-of-the-art overaparameterized unfolding networks and deploy a new framework to estimate their adversarial Rademacher complexity. Given this estimate, we provide adversarial generalization error bounds for the networks under study, which are tight with respect to the attack level. To our knowledge, this is the first theoretical analysis on the adversarial generalization of unfolding networks. We further present a series of experiments on real-world data, with results corroborating our derived theory, consistently for all data. Finally, we observe that the family's overparameterization can be exploited to promote adversarial robustness, shedding light on how to efficiently robustify neural networks.





How to warm-start your unfolding network

Kouni, Vicky

arXiv.org Artificial Intelligence

We present a new ensemble framework for boosting the performance of overparameterized unfolding networks solving the compressed sensing problem. We combine a state-of-the-art overparameterized unfolding network with a continuation technique, to warm-start a crucial quantity of the said network's architecture; we coin the resulting continued network C-DEC. Moreover, for training and evaluating C-DEC, we incorporate the log-cosh loss function, which enjoys both linear and quadratic behavior. Finally, we numerically assess C-DEC's performance on real-world images. Results showcase that the combination of continuation with the overparameterized unfolded architecture, trained and evaluated with the chosen loss function, yields smoother loss landscapes and improved reconstruction and generalization performance of C-DEC, consistently for all datasets.


Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales

Eamaz, Arian, Yeganegi, Farhang, Soltanalian, Mojtaba

arXiv.org Artificial Intelligence

Recent advancements in Large Language Model (LLM) compression, such as BitNet and BitNet b1.58, have marked significant strides in reducing the computational demands of LLMs through innovative one-bit quantization techniques. We extend this frontier by looking at Large Inference Models (LIMs) that have become indispensable across various applications. However, their scale and complexity often come at a significant computational cost. We introduce a novel approach that leverages one-bit algorithm unrolling, effectively integrating information from the physical world in the model architecture. Our method achieves a bit-per-link rate significantly lower than the 1.58 bits reported in prior work, thanks to the natural sparsity that emerges in our network architectures. We numerically demonstrate that the proposed one-bit algorithm unrolling scheme can improve both training and test outcomes by effortlessly increasing the number of layers while substantially compressing the network. Additionally, we provide theoretical results on the generalization gap, convergence rate, stability, and sensitivity of our proposed one-bit algorithm unrolling.


Review for NeurIPS paper: Depth Uncertainty in Neural Networks

Neural Information Processing Systems

Weaknesses: My main concern is mainly about the experiments section. It is great to see the demonstration of the method on ResNet-50. But I couldn't find the comparison of test accuracy through the paper. There is only log-likelihood included in the appendix. I would expect a comparison on test accuracy among DUN, MC-dropout, deep ensembles, deep ensembles with different depths.