AITopics | Samragh, Mohammad

Collaborating Authors

Samragh, Mohammad

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models

Kim, Han-Byul, Hoang, Duc, Kundu, Arnav, Samragh, Mohammad, Cho, Minsik

arXiv.org Artificial IntelligenceFeb-28-2025

With the rapid expansion in the scale of large language models (LLMs), enabling efficient distributed inference across multiple computing units has become increasingly critical. However, communication overheads from popular distributed inference techniques such as Tensor Parallelism pose a significant challenge to achieve scalability and low latency. Therefore, we introduce a novel optimization technique, Sync-Point Drop (SPD), to reduce communication overheads in tensor parallelism by selectively dropping synchronization on attention outputs. In detail, we first propose a block design that allows execution to proceed without communication through SPD. Second, we apply different SPD strategies to attention blocks based on their sensitivity to the model accuracy. The proposed methods effectively alleviate communication bottlenecks while minimizing accuracy degradation during LLM inference, offering a scalable solution for diverse distributed environments: SPD offered about 20% overall inference latency reduction with < 1% accuracy regression for LLaMA2-70B inference over 8 GPUs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.20727

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Weight subcloning: direct initialization of transformers using larger pretrained ones

Samragh, Mohammad, Farajtabar, Mehrdad, Mehta, Sachin, Vemulapalli, Raviteja, Faghri, Fartash, Naik, Devang, Tuzel, Oncel, Rastegari, Mohammad

arXiv.org Artificial IntelligenceDec-14-2023

Training large transformer models from scratch for a target task requires lots of data and is computationally demanding. The usual practice of transfer learning overcomes this challenge by initializing the model with weights of a pretrained model of the same size and specification to increase the convergence and training speed. However, what if no pretrained model of the required size is available? In this paper, we introduce a simple yet effective technique to transfer the knowledge of a pretrained model to smaller variants. Our approach called weight subcloning expedites the training of scaled-down transformers by initializing their weights from larger pretrained models. Weight subcloning involves an operation on the pretrained model to obtain the equivalent initialized scaled-down model. It consists of two key steps: first, we introduce neuron importance ranking to decrease the embedding dimension per layer in the pretrained model. Then, we remove blocks from the transformer model to match the number of layers in the scaled-down network. The result is a network ready to undergo training, which gains significant improvements in training speed compared to random initialization. For instance, we achieve 4x faster training for vision transformers in image classification and language models designed for next token prediction.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2312.09299

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

CLEANN: Accelerated Trojan Shield for Embedded Neural Networks

Javaheripi, Mojan, Samragh, Mohammad, Fields, Gregory, Javidi, Tara, Koushanfar, Farinaz

arXiv.org Machine LearningSep-4-2020

We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications. A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger. What differentiates CLEANN from the prior work is its lightweight methodology which recovers the ground-truth class of Trojan samples without the need for labeled data, model retraining, or prior assumptions on the trigger or the attack. We leverage dictionary learning and sparse approximation to characterize the statistical behavior of benign data and identify Trojan triggers. CLEANN is devised based on algorithm/hardware co-design and is equipped with specialized hardware to enable efficient real-time execution on resource-constrained embedded platforms. Proof of concept evaluations on CLEANN for the state-of-the-art Neural Trojan attacks on visual benchmarks demonstrate its competitive advantage in terms of attack resiliency and execution overhead.

deep learning, lea nn, neural network, (16 more...)

arXiv.org Machine Learning

doi: 10.1145/3400302.3415671

2009.02326

Country: North America > United States (0.69)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Imani, Mohsen, Samragh, Mohammad, Kim, Yeseong, Gupta, Saransh, Koushanfar, Farinaz, Rosing, Tajana

arXiv.org Artificial IntelligenceApr-11-2019

Deep neural networks (DNN) have demonstrated effectiveness for various applications such as image processing, video segmentation, and speech recognition. Running state-of-the-art DNNs on current systems mostly relies on either generalpurpose processors, ASIC designs, or FPGA accelerators, all of which suffer from data movements due to the limited onchip memory and data transfer bandwidth. In this work, we propose a novel framework, called RAPIDNN, which processes all DNN operations within the memory to minimize the cost of data movement. To enable in-memory processing, RAPIDNN reinterprets a DNN model and maps it into a specialized accelerator, which is designed using non-volatile memory blocks that model four fundamental DNN operations, i.e., multiplication, addition, activation functions, and pooling. The framework extracts representative operands of a DNN model, e.g., weights and input values, using clustering methods to optimize the model for in-memory processing. Then, it maps the extracted operands and their precomputed results into the accelerator memory blocks. At runtime, the accelerator identifies computation results based on efficient in-memory search capability which also provides tunability of approximation to further improve computation efficiency. Our evaluation shows that RAPIDNN achieves 68.4x, 49.5x energy efficiency improvement and 48.1x, 10.9x speedup as compared to ISAAC and PipeLayer, the state-of-the-art DNN accelerators, while ensuring less than 0.3% of quality loss.

accelerator, deep learning, neural network, (21 more...)

arXiv.org Artificial Intelligence

1806.05794

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CodeX: Bit-Flexible Encoding for Streaming-based FPGA Acceleration of DNNs

Samragh, Mohammad, Javaheripi, Mojan, Koushanfar, Farinaz

arXiv.org Machine LearningJan-16-2019

This paper proposes CodeX, an end-to-end framework that facilitates encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms. CodeX incorporates nonlinear encoding to the computation flow of neural networks to save memory. The encoded features demand significantly lower storage compared to the raw full-precision activation values; therefore, the execution flow of CodeX hardware engine is completely performed within the FPGA using on-chip streaming buffers with no access to the off-chip DRAM. We further propose a fully-automated algorithm inspired by reinforcement learning which determines the customized encoding bitwidth across network layers. CodeX full-stack framework comprises of a compiler which takes a high-level Python description of an arbitrary neural network architecture. The compiler then instantiates the corresponding elements from CodeX Hardware library for FPGA implementation. Proof-of-concept evaluations on MNIST, SVHN, and CIFAR-10 datasets demonstrate an average of 4.65x throughput improvement compared to stand-alone weight encoding. We further compare CodeX with six existing full-precision DNN accelerators on ImageNet, showing an average of 3.6x and 2.54x improvement in throughput and performance-per-watt, respectively.

activation, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1901.05582

Country: North America > United States > California (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

CuRTAIL: ChaRacterizing and Thwarting AdversarIal deep Learning

Rouhani, Bita Darvish, Samragh, Mohammad, Javidi, Tara, Koushanfar, Farinaz

arXiv.org Machine LearningDec-21-2017

This paper proposes CuRTAIL, an end-to-end computing framework for characterizing and thwarting adversarial space in the context of Deep Learning (DL). The framework protects deep neural networks against adversarial samples, which are perturbed inputs carefully crafted by malicious entities to mislead the underlying DL model. The precursor for the proposed methodology is a set of new quantitative metrics to assess the vulnerability of various deep learning architectures to adversarial samples. CuRTAIL formalizes the goal of preventing adversarial samples as a minimization of the space unexplored by the pertinent DL model that is characterized in CuRTAIL vulnerability analysis step. To thwart the adversarial machine learning attack, CuRTAIL introduces the concept of Modular Robust Redundancy (MRR) as a viable solution to achieve the formalized minimization objective. The MRR methodology explicitly characterizes the geometry of the input data and the DL model parameters. It then learns a set of complementary but disjoint models which maximally cover the unexplored subspaces of the target DL model, thus reducing the risk of integrity attacks. We extensively evaluate CuRTAIL performance against the state-of-the-art attack models including fast-sign-gradient, Jacobian Saliency Map Attack, Deepfool, and Carlini&WagnerL2. Proof-of-concept implementations for analyzing various data collections including MNIST, CIFAR10, and ImageNet corroborate CuRTAIL effectiveness to detect adversarial samples in different settings. The computations in each MRR module can be performed independently. As such, CuRTAIL detection algorithm can be completely parallelized among multiple hardware settings to achieve maximum throughput. We further provide an accompanying API to facilitate the adoption of the proposed framework for various applications.

adversarial sample, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

1709.02538

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback