Goto

Collaborating Authors

 Soltanalian, Mojtaba


Predicting Through Generation: Why Generation Is Better for Prediction

arXiv.org Artificial Intelligence

This paper argues that generating output tokens is more effective than using pooled representations for prediction tasks because token-level generation retains more mutual information. Since LLMs are trained on massive text corpora using next-token prediction, generation aligns naturally with their learned behavior. Using the Data Processing Inequality (DPI), we provide both theoretical and empirical evidence supporting this claim. However, autoregressive models face two key challenges when used for prediction: (1) exposure bias, where the model sees ground truth tokens during training but relies on its own predictions during inference, leading to errors, and (2) format mismatch, where discrete tokens do not always align with the tasks required output structure. To address these challenges, we introduce PredGen(Predicting Through Generating), an end to end framework that (i) uses scheduled sampling to reduce exposure bias, and (ii) introduces a task adapter to convert the generated tokens into structured outputs. Additionally, we introduce Writer-Director Alignment Loss (WDAL), which ensures consistency between token generation and final task predictions, improving both text coherence and numerical accuracy. We evaluate PredGen on multiple classification and regression benchmarks. Our results show that PredGen consistently outperforms standard baselines, demonstrating its effectiveness in structured prediction tasks.


Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales

arXiv.org Artificial Intelligence

Recent advancements in Large Language Model (LLM) compression, such as BitNet and BitNet b1.58, have marked significant strides in reducing the computational demands of LLMs through innovative one-bit quantization techniques. We extend this frontier by looking at Large Inference Models (LIMs) that have become indispensable across various applications. However, their scale and complexity often come at a significant computational cost. We introduce a novel approach that leverages one-bit algorithm unrolling, effectively integrating information from the physical world in the model architecture. Our method achieves a bit-per-link rate significantly lower than the 1.58 bits reported in prior work, thanks to the natural sparsity that emerges in our network architectures. We numerically demonstrate that the proposed one-bit algorithm unrolling scheme can improve both training and test outcomes by effortlessly increasing the number of layers while substantially compressing the network. Additionally, we provide theoretical results on the generalization gap, convergence rate, stability, and sensitivity of our proposed one-bit algorithm unrolling.


RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates

arXiv.org Artificial Intelligence

We propose RoCoFT, a parameter-efficient fine-tuning method for large-scale language models (LMs) based on updating only a few rows and columns of the weight matrices in transformers. Through extensive experiments with medium-size LMs like BERT and RoBERTa, and larger LMs like Bloom-7B, Llama2-7B, and Llama2-13B, we show that our method gives comparable or better accuracies than state-of-art PEFT methods while also being more memory and computation-efficient. We also study the reason behind the effectiveness of our method with tools from neural tangent kernel theory. We empirically demonstrate that our kernel, constructed using a restricted set of row and column parameters, are numerically close to the full-parameter kernel and gives comparable classification performance. Ablation studies are conducted to investigate the impact of different algorithmic choices, including the selection strategy for rows and columns as well as the optimal rank for effective implementation of our method.


Data-Aware Training Quality Monitoring and Certification for Reliable Deep Learning

arXiv.org Artificial Intelligence

Deep learning models have become crucial for tackling complex computational problems, owing to the rich representations they develop through their multi-layered structures and non-linear transformations [1, 2]. Despite their remarkable effectiveness, these models are often perceived as black boxes, raising concerns related to their robustness, reliability, and safety. As neural networks become increasingly integral to critical applications, ensuring that they are properly trained and perform as intended is paramount. To evaluate the training performance and a network's ability to store a model after training (i.e., achieve zero loss), one approach is to statistically analyze neural networks under certain assumptions. This has been done for networks with thresholding activation functions like ReLU, where researchers have determined the number of parameters needed to achieve full memory capacity [3]. It is well-known that for ReLU-based neural networks (NNs), once a sufficient number of weights is reached, the network can achieve full memory capacity or even zero loss in some cases. In [4], the authors theoretically demonstrate that in the over-parameterization regime, the stochastic gradient descent (SGD) algorithm can converge to the global minimum. However, these methods are statistical in nature and rely on specific assumptions about the input data and the model, which may limit their applicability. The first two authors contributed equally to this work.


Deep Learning Meets Adaptive Filtering: A Stein's Unbiased Risk Estimator Approach

arXiv.org Artificial Intelligence

This paper revisits two prominent adaptive filtering algorithms, namely recursive least squares (RLS) and equivariant adaptive source separation (EASI), through the lens of algorithm unrolling. Building upon the unrolling methodology, we introduce novel task-based deep learning frameworks, denoted as Deep RLS and Deep EASI. These architectures transform the iterations of the original algorithms into layers of a deep neural network, enabling efficient source signal estimation by leveraging a training process. To further enhance performance, we propose training these deep unrolled networks utilizing a surrogate loss function grounded on Stein's unbiased risk estimator (SURE). Our empirical evaluations demonstrate that the Deep RLS and Deep EASI networks outperform their underlying algorithms. Moreover, the efficacy of SURE-based training in comparison to conventional mean squared error loss is highlighted by numerical experiments. The unleashed potential of SURE-based training in this paper sets a benchmark for future employment of SURE either for training purposes or as an evaluation metric for generalization performance of neural networks.


One-Bit Compressive Sensing: Can We Go Deep and Blind?

arXiv.org Artificial Intelligence

One-bit compressive sensing is concerned with the accurate recovery of an underlying sparse signal of interest from its one-bit noisy measurements. The conventional signal recovery approaches for this problem are mainly developed based on the assumption that an exact knowledge of the sensing matrix is available. In this work, however, we present a novel data-driven and model-based methodology that achieves blind recovery; i.e., signal recovery without requiring the knowledge of the sensing matrix. To this end, we make use of the deep unfolding technique and develop a model-driven deep neural architecture which is designed for this specific task. The proposed deep architecture is able to learn an alternative sensing matrix by taking advantage of the underlying unfolded algorithm such that the resulting learned recovery algorithm can accurately and quickly (in terms of the number of iterations) recover the underlying compressed signal of interest from its one-bit noisy measurements. In addition, due to the incorporation of the domain knowledge and the mathematical model of the system into the proposed deep architecture, the resulting network benefits from enhanced interpretability, has a very small number of trainable parameters, and requires very small number of training samples, as compared to the commonly used black-box deep neural network alternatives for the problem at hand.


Unfolded Algorithms for Deep Phase Retrieval

arXiv.org Machine Learning

Exploring the idea of phase retrieval has been intriguing researchers for decades, due to its appearance in a wide range of applications. The task of a phase retrieval algorithm is typically to recover a signal from linear phaseless measurements. In this paper, we approach the problem by proposing a hybrid model-based data-driven deep architecture, referred to as Unfolded Phase Retrieval (UPR), that exhibits significant potential in improving the performance of state-of-the art data-driven and model-based phase retrieval algorithms. The proposed method benefits from versatility and interpretability of well-established model-based algorithms, while simultaneously benefiting from the expressive power of deep neural networks. In particular, our proposed model-based deep architecture is applied to the conventional phase retrieval problem (via the incremental reshaped Wirtinger flow algorithm) and the sparse phase retrieval problem (via the sparse truncated amplitude flow algorithm), showing immense promise in both cases. Furthermore, we consider a joint design of the sensing matrix and the signal processing algorithm and utilize the deep unfolding technique in the process. Our numerical results illustrate the effectiveness of such hybrid model-based and data-driven frameworks and showcase the untapped potential of data-aided methodologies to enhance the existing phase retrieval algorithms.


Deep-RLS: A Model-Inspired Deep Learning Approach to Nonlinear PCA

arXiv.org Machine Learning

In this work, we consider the application of model-based deep learning in nonlinear principal component analysis (PCA). Inspired by the deep unfolding methodology, we propose a task-based deep learning approach, referred to as Deep-RLS, that unfolds the iterations of the well-known recursive least squares (RLS) algorithm into the layers of a deep neural network in order to perform nonlinear PCA. In particular, we formulate the nonlinear PCA for the blind source separation (BSS) problem and show through numerical analysis that Deep-RLS results in a significant improvement in the accuracy of recovering the source signals in BSS when compared to the traditional RLS algorithm.


Deep-URL: A Model-Aware Approach To Blind Deconvolution Based On Deep Unfolded Richardson-Lucy Network

arXiv.org Artificial Intelligence

The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications. Hence, it is of paramount importance to develop interpretable deep learning models. In this paper, we consider the problem of blind deconvolution and propose a novel model-aware deep architecture that allows for the recovery of both the blur kernel and the sharp image from the blurred image. In particular, we propose the Deep Unfolded Richardson-Lucy (Deep-URL) framework -- an interpretable deep-learning architecture that can be seen as an amalgamation of classical estimation technique and deep neural network, and consequently leads to improved performance. Our numerical investigations demonstrate significant improvement compared to state-of-the-art algorithms.


Deep Signal Recovery with One-Bit Quantization

arXiv.org Machine Learning

Machine learning, and more specifically deep learning, have shown remarkable performance in sensing, communications, and inference. In this paper, we consider the application of the deep unfolding technique in the problem of signal reconstruction from its one-bit noisy measurements. Namely, we propose a model-based machine learning method and unfold the iterations of an inference optimization algorithm into the layers of a deep neural network for one-bit signal recovery. The resulting network, which we refer to as DeepRec, can efficiently handle the recovery of high-dimensional signals from acquired one-bit noisy measurements. The proposed method results in an improvement in accuracy and computational efficiency with respect to the original framework as shown through numerical analysis.