AITopics | Soltanalian, Mojtaba

Collaborating Authors

Soltanalian, Mojtaba

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Predicting Through Generation: Why Generation Is Better for Prediction

Kowsher, Md, Prottasha, Nusrat Jahan, Bhat, Prakash, Yu, Chun-Nam, Soltanalian, Mojtaba, Garibay, Ivan, Garibay, Ozlem, Chen, Chen, Yousefi, Niloofar

arXiv.org Artificial IntelligenceFeb-24-2025

This paper argues that generating output tokens is more effective than using pooled representations for prediction tasks because token-level generation retains more mutual information. Since LLMs are trained on massive text corpora using next-token prediction, generation aligns naturally with their learned behavior. Using the Data Processing Inequality (DPI), we provide both theoretical and empirical evidence supporting this claim. However, autoregressive models face two key challenges when used for prediction: (1) exposure bias, where the model sees ground truth tokens during training but relies on its own predictions during inference, leading to errors, and (2) format mismatch, where discrete tokens do not always align with the tasks required output structure. To address these challenges, we introduce PredGen(Predicting Through Generating), an end to end framework that (i) uses scheduled sampling to reduce exposure bias, and (ii) introduces a task adapter to convert the generated tokens into structured outputs. Additionally, we introduce Writer-Director Alignment Loss (WDAL), which ensures consistency between token generation and final task predictions, improving both text coherence and numerical accuracy. We evaluate PredGen on multiple classification and regression benchmarks. Our results show that PredGen consistently outperforms standard baselines, demonstrating its effectiveness in structured prediction tasks.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.17817

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Film (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales

Eamaz, Arian, Yeganegi, Farhang, Soltanalian, Mojtaba

arXiv.org Artificial IntelligenceFeb-7-2025

Recent advancements in Large Language Model (LLM) compression, such as BitNet and BitNet b1.58, have marked significant strides in reducing the computational demands of LLMs through innovative one-bit quantization techniques. We extend this frontier by looking at Large Inference Models (LIMs) that have become indispensable across various applications. However, their scale and complexity often come at a significant computational cost. We introduce a novel approach that leverages one-bit algorithm unrolling, effectively integrating information from the physical world in the model architecture. Our method achieves a bit-per-link rate significantly lower than the 1.58 bits reported in prior work, thanks to the natural sparsity that emerges in our network architectures. We numerically demonstrate that the proposed one-bit algorithm unrolling scheme can improve both training and test outcomes by effortlessly increasing the number of layers while substantially compressing the network. Additionally, we provide theoretical results on the generalization gap, convergence rate, stability, and sensitivity of our proposed one-bit algorithm unrolling.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.01908

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates

Kowsher, Md, Esmaeilbeig, Tara, Yu, Chun-Nam, Soltanalian, Mojtaba, Yousefi, Niloofar

arXiv.org Artificial IntelligenceOct-15-2024

We propose RoCoFT, a parameter-efficient fine-tuning method for large-scale language models (LMs) based on updating only a few rows and columns of the weight matrices in transformers. Through extensive experiments with medium-size LMs like BERT and RoBERTa, and larger LMs like Bloom-7B, Llama2-7B, and Llama2-13B, we show that our method gives comparable or better accuracies than state-of-art PEFT methods while also being more memory and computation-efficient. We also study the reason behind the effectiveness of our method with tools from neural tangent kernel theory. We empirically demonstrate that our kernel, constructed using a restricted set of row and column parameters, are numerically close to the full-parameter kernel and gives comparable classification performance. Ablation studies are conducted to investigate the impact of different algorithmic choices, including the selection strategy for rows and columns as well as the optimal rank for effective implementation of our method.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.10075

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Data-Aware Training Quality Monitoring and Certification for Reliable Deep Learning

Yeganegi, Farhang, Eamaz, Arian, Soltanalian, Mojtaba

arXiv.org Artificial IntelligenceOct-14-2024

Deep learning models have become crucial for tackling complex computational problems, owing to the rich representations they develop through their multi-layered structures and non-linear transformations [1, 2]. Despite their remarkable effectiveness, these models are often perceived as black boxes, raising concerns related to their robustness, reliability, and safety. As neural networks become increasingly integral to critical applications, ensuring that they are properly trained and perform as intended is paramount. To evaluate the training performance and a network's ability to store a model after training (i.e., achieve zero loss), one approach is to statistically analyze neural networks under certain assumptions. This has been done for networks with thresholding activation functions like ReLU, where researchers have determined the number of parameters needed to achieve full memory capacity [3]. It is well-known that for ReLU-based neural networks (NNs), once a sufficient number of weights is reached, the network can achieve full memory capacity or even zero loss in some cases. In [4], the authors theoretically demonstrate that in the over-parameterization regime, the stochastic gradient descent (SGD) algorithm can converge to the global minimum. However, these methods are statistical in nature and rely on specific assumptions about the input data and the model, which may limit their applicability. The first two authors contributed equally to this work.

artificial intelligence, machine learning, training process, (18 more...)

arXiv.org Artificial Intelligence

2410.10984

Country: North America > United States (0.47)

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning Meets Adaptive Filtering: A Stein's Unbiased Risk Estimator Approach

Esmaeilbeig, Zahra, Soltanalian, Mojtaba

arXiv.org Artificial IntelligenceOct-5-2023

This paper revisits two prominent adaptive filtering algorithms, namely recursive least squares (RLS) and equivariant adaptive source separation (EASI), through the lens of algorithm unrolling. Building upon the unrolling methodology, we introduce novel task-based deep learning frameworks, denoted as Deep RLS and Deep EASI. These architectures transform the iterations of the original algorithms into layers of a deep neural network, enabling efficient source signal estimation by leveraging a training process. To further enhance performance, we propose training these deep unrolled networks utilizing a surrogate loss function grounded on Stein's unbiased risk estimator (SURE). Our empirical evaluations demonstrate that the Deep RLS and Deep EASI networks outperform their underlying algorithms. Moreover, the efficacy of SURE-based training in comparison to conventional mean squared error loss is highlighted by numerical experiments. The unleashed potential of SURE-based training in this paper sets a benchmark for future employment of SURE either for training purposes or as an evaluation metric for generalization performance of neural networks.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/Allerton58177.2023.10313423

2307.16708

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

One-Bit Compressive Sensing: Can We Go Deep and Blind?

Zeng, Yiming, Khobahi, Shahin, Soltanalian, Mojtaba

arXiv.org Artificial IntelligenceSep-21-2022

One-bit compressive sensing is concerned with the accurate recovery of an underlying sparse signal of interest from its one-bit noisy measurements. The conventional signal recovery approaches for this problem are mainly developed based on the assumption that an exact knowledge of the sensing matrix is available. In this work, however, we present a novel data-driven and model-based methodology that achieves blind recovery; i.e., signal recovery without requiring the knowledge of the sensing matrix. To this end, we make use of the deep unfolding technique and develop a model-driven deep neural architecture which is designed for this specific task. The proposed deep architecture is able to learn an alternative sensing matrix by taking advantage of the underlying unfolded algorithm such that the resulting learned recovery algorithm can accurately and quickly (in terms of the number of iterations) recover the underlying compressed signal of interest from its one-bit noisy measurements. In addition, due to the incorporation of the domain knowledge and the mathematical model of the system into the proposed deep architecture, the resulting network benefits from enhanced interpretability, has a very small number of trainable parameters, and requires very small number of training samples, as compared to the commonly used black-box deep neural network alternatives for the problem at hand.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LSP.2022.3187318

2203.11278

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unfolded Algorithms for Deep Phase Retrieval

Naimipour, Naveed, Khobahi, Shahin, Soltanalian, Mojtaba

arXiv.org Machine LearningDec-20-2020

Exploring the idea of phase retrieval has been intriguing researchers for decades, due to its appearance in a wide range of applications. The task of a phase retrieval algorithm is typically to recover a signal from linear phaseless measurements. In this paper, we approach the problem by proposing a hybrid model-based data-driven deep architecture, referred to as Unfolded Phase Retrieval (UPR), that exhibits significant potential in improving the performance of state-of-the art data-driven and model-based phase retrieval algorithms. The proposed method benefits from versatility and interpretability of well-established model-based algorithms, while simultaneously benefiting from the expressive power of deep neural networks. In particular, our proposed model-based deep architecture is applied to the conventional phase retrieval problem (via the incremental reshaped Wirtinger flow algorithm) and the sparse phase retrieval problem (via the sparse truncated amplitude flow algorithm), showing immense promise in both cases. Furthermore, we consider a joint design of the sensing matrix and the signal processing algorithm and utilize the deep unfolding technique in the process. Our numerical results illustrate the effectiveness of such hybrid model-based and data-driven frameworks and showcase the untapped potential of data-aided methodologies to enhance the existing phase retrieval algorithms.

algorithm, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2012.11102

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep-RLS: A Model-Inspired Deep Learning Approach to Nonlinear PCA

Esmaeilbeig, Zahra, Khobahi, Shahin, Soltanalian, Mojtaba

arXiv.org Machine LearningNov-17-2020

In this work, we consider the application of model-based deep learning in nonlinear principal component analysis (PCA). Inspired by the deep unfolding methodology, we propose a task-based deep learning approach, referred to as Deep-RLS, that unfolds the iterations of the well-known recursive least squares (RLS) algorithm into the layers of a deep neural network in order to perform nonlinear PCA. In particular, we formulate the nonlinear PCA for the blind source separation (BSS) problem and show through numerical analysis that Deep-RLS results in a significant improvement in the accuracy of recovering the source signals in BSS when compared to the traditional RLS algorithm.

algorithm, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2011.07458

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep-URL: A Model-Aware Approach To Blind Deconvolution Based On Deep Unfolded Richardson-Lucy Network

Agarwal, Chirag, Khobahi, Shahin, Bose, Arindam, Soltanalian, Mojtaba, Schonfeld, Dan

arXiv.org Artificial IntelligenceJun-7-2020

The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications. Hence, it is of paramount importance to develop interpretable deep learning models. In this paper, we consider the problem of blind deconvolution and propose a novel model-aware deep architecture that allows for the recovery of both the blur kernel and the sharp image from the blurred image. In particular, we propose the Deep Unfolded Richardson-Lucy (Deep-URL) framework -- an interpretable deep-learning architecture that can be seen as an amalgamation of classical estimation technique and deep neural network, and consequently leads to improved performance. Our numerical investigations demonstrate significant improvement compared to state-of-the-art algorithms.

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICIP40778.2020.9190825

2002.01053

Country: North America > United States > Illinois (0.14)

Genre: Research Report > Promising Solution (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Signal Recovery with One-Bit Quantization

Khobahi, Shahin, Naimipour, Naveed, Soltanalian, Mojtaba, Eldar, Yonina C.

arXiv.org Machine LearningNov-29-2018

Machine learning, and more specifically deep learning, have shown remarkable performance in sensing, communications, and inference. In this paper, we consider the application of the deep unfolding technique in the problem of signal reconstruction from its one-bit noisy measurements. Namely, we propose a model-based machine learning method and unfold the iterations of an inference optimization algorithm into the layers of a deep neural network for one-bit signal recovery. The resulting network, which we refer to as DeepRec, can efficiently handle the recovery of high-dimensional signals from acquired one-bit noisy measurements. The proposed method results in an improvement in accuracy and computational efficiency with respect to the original framework as shown through numerical analysis.

deep learning, iteration, neural network, (18 more...)

arXiv.org Machine Learning

1812.00797

Country:

North America > United States > Illinois (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback