Goto

Collaborating Authors

 test vector


LM-Fix: Lightweight Bit-Flip Detection and Rapid Recovery Framework for Language Models

Tahmasivand, Ahmad, Zahran, Noureldin, Al-Sayouri, Saba, Fouda, Mohammed, Khasawneh, Khaled N.

arXiv.org Artificial Intelligence

Abstract--Bit-flip attacks threaten the reliability and security of Language Models (LMs) by altering internal parameters and compromising output integrity. Recent studies show that flipping only a few bits in model parameters can bypass safety mechanisms and jailbreak the model. Existing detection approaches for DNNs and CNNs are not suitable for LMs, as the massive number of parameters significantly increases timing and memory overhead for software-based methods and chip area overhead for hardware-based methods. In this work, we present LM-Fix, a lightweight LM-driven detection and recovery framework that leverages the model's own capabilities to identify and recover faults. Our method detects bit-flips by generating a single output token from a predefined test vector and auditing the output tensor of a target layer against stored reference data. The same mechanism enables rapid recovery without reloading the entire model. Experiments across various models show that LM-Fix detects more than 94% of single-bit flips and nearly 100% of multi-bit flips, with very low computational overhead ( 1%- 7.7% at TVL = 200 across models). Recovery achieves more than 100 speedup compared to full-model reload, which is critical in edge devices. LM-Fix can handle bit-flips affecting any part of the model's computation, including memory, cache, and arithmetic operations. Evaluation against recent LM-specific bit-flip attacks confirms its robustness and practical value for real-world deployment.


Periodic Online Testing for Sparse Systolic Tensor Arrays

Peltekis, Christodoulos, Nicopoulos, Chrysostomos, Dimitrakopoulos, Giorgos

arXiv.org Artificial Intelligence

Modern Machine Learning (ML) applications often benefit from structured sparsity, a technique that efficiently reduces model complexity and simplifies handling of sparse data in hardware. Sparse systolic tensor arrays - specifically designed to accelerate these structured-sparse ML models - play a pivotal role in enabling efficient computations. As ML is increasingly integrated into safety-critical systems, it is of paramount importance to ensure the reliability of these systems. This paper introduces an online error-checking technique capable of detecting and locating permanent faults within sparse systolic tensor arrays before computation begins. The new technique relies on merely four test vectors and exploits the weight values already loaded within the systolic array to comprehensively test the system. Fault-injection campaigns within the gate-level netlist, while executing three well-established Convolutional Neural Networks (CNN), validate the efficiency of the proposed approach, which is shown to achieve very high fault coverage, while incurring minimal performance and area overheads.


Hiding in Plain Sight: Reframing Hardware Trojan Benchmarking as a Hide&Seek Modification

Sarihi, Amin, Patooghy, Ahmad, Jamieson, Peter, Badawy, Abdel-Hameed A.

arXiv.org Artificial Intelligence

This work focuses on advancing security research in the hardware design space by formally defining the realistic problem of Hardware Trojan (HT) detection. The goal is to model HT detection more closely to the real world, i.e., describing the problem as The Seeker's Dilemma where a detecting agent is unaware of whether circuits are infected by HTs or not. Using this theoretical problem formulation, we create a benchmark that consists of a mixture of HT-free and HT-infected restructured circuits while preserving their original functionalities. The restructured circuits are randomly infected by HTs, causing a situation where the defender is uncertain if a circuit is infected or not. We believe that our innovative benchmark and methodology of creating benchmarks will help the community judge the detection quality of different methods by comparing their success rates in circuit classification. We use our developed benchmark to evaluate three state-of-the-art HT detection tools to show baseline results for this approach. We use Principal Component Analysis to assess the strength of our benchmark, where we observe that some restructured HT-infected circuits are mapped closely to HT-free circuits, leading to significant label misclassification by detectors.


Few-Shot Testing: Estimating Uncertainty of Memristive Deep Neural Networks Using One Bayesian Test Vector

Ahmed, Soyed Tuhin, Tahoori, Mehdi

arXiv.org Artificial Intelligence

The performance of deep learning algorithms such as neural networks (NNs) has increased tremendously recently, and they can achieve state-of-the-art performance in many domains. However, due to memory and computation resource constraints, implementing NNs on edge devices is a challenging task. Therefore, hardware accelerators such as computation-in-memory (CIM) with memristive devices have been developed to accelerate the most common operations, i.e., matrix-vector multiplication. However, due to inherent device properties, external environmental factors such as temperature, and an immature fabrication process, memristors suffer from various non-idealities, including defects and variations occurring during manufacturing and runtime. Consequently, there is a lack of complete confidence in the predictions made by the model. To improve confidence in NN predictions made by hardware accelerators in the presence of device non-idealities, in this paper, we propose a Bayesian test vector generation framework that can estimate the model uncertainty of NNs implemented on memristor-based CIM hardware. Compared to the conventional point estimate test vector generation method, our method is more generalizable across different model dimensions and requires storing only one test Bayesian vector in the hardware. Our method is evaluated on different model dimensions, tasks, fault rates, and variation noise to show that it can consistently achieve $100\%$ coverage with only $0.024$ MB of memory overhead.


TrojanForge: Adversarial Hardware Trojan Examples with Reinforcement Learning

Sarihi, Amin, Jamieson, Peter, Patooghy, Ahmad, Badawy, Abdel-Hameed A.

arXiv.org Artificial Intelligence

The Hardware Trojan (HT) problem can be thought of as a continuous game between attackers and defenders, each striving to outsmart the other by leveraging any available means for an advantage. Machine Learning (ML) has recently been key in advancing HT research. Various novel techniques, such as Reinforcement Learning (RL) and Graph Neural Networks (GNNs), have shown HT insertion and detection capabilities. HT insertion with ML techniques, specifically, has seen a spike in research activity due to the shortcomings of conventional HT benchmarks and the inherent human design bias that occurs when we create them. This work continues this innovation by presenting a tool called "TrojanForge", capable of generating HT adversarial examples that defeat HT detectors; demonstrating the capabilities of GAN-like adversarial tools for automatic HT insertion. We introduce an RL environment where the RL insertion agent interacts with HT detectors in an insertion-detection loop where the agent collects rewards based on its success in bypassing HT detectors. Our results show that this process leads to inserted HTs that evade various HT detectors, achieving high attack success percentages. This tool provides insight into why HT insertion fails in some instances and how we can leverage this knowledge in defense.


Scalable and Efficient Methods for Uncertainty Estimation and Reduction in Deep Learning

Ahmed, Soyed Tuhin

arXiv.org Artificial Intelligence

Neural networks (NNs) can achieved high performance in various fields such as computer vision, and natural language processing. However, deploying NNs in resource-constrained safety-critical systems has challenges due to uncertainty in the prediction caused by out-of-distribution data, and hardware non-idealities. To address the challenges of deploying NNs in resource-constrained safety-critical systems, this paper summarizes the (4th year) PhD thesis work that explores scalable and efficient methods for uncertainty estimation and reduction in deep learning, with a focus on Computation-in-Memory (CIM) using emerging resistive non-volatile memories. We tackle the inherent uncertainties arising from out-of-distribution inputs and hardware non-idealities, crucial in maintaining functional safety in automated decision-making systems. Our approach encompasses problem-aware training algorithms, novel NN topologies, and hardware co-design solutions, including dropout-based \emph{binary} Bayesian Neural Networks leveraging spintronic devices and variational inference techniques. These innovations significantly enhance OOD data detection, inference accuracy, and energy efficiency, thereby contributing to the reliability and robustness of NN implementations.


Testing Spintronics Implemented Monte Carlo Dropout-Based Bayesian Neural Networks

Ahmed, Soyed Tuhin, Hefenbrock, Michael, Prenat, Guillaume, Anghel, Lorena, Tahoori, Mehdi B.

arXiv.org Artificial Intelligence

Bayesian Neural Networks (BayNNs) can inherently estimate predictive uncertainty, facilitating informed decision-making. Dropout-based BayNNs are increasingly implemented in spintronics-based computation-in-memory architectures for resourceconstrained yet high-performance safety-critical applications. Although uncertainty estimation is important, the reliability of Dropout generation and BayNN computation is equally important for target applications but is overlooked in existing works. However, testing BayNNs is significantly more challenging compared to conventional NNs, due to their stochastic nature. In this paper, we present for the first time the model of the non-idealities of the spintronics-based Dropout module and analyze their impact on uncertainty estimates and accuracy. Furthermore, we propose a testing framework based on repeatability ranking for Dropout-based BayNN with up to 100% fault coverage while using only 0.2% of training data as test vectors. Bayesian Neural Networks (BayNNs) offer substantial benefits over conventional neural networks (NNs), particularly in safety-critical applications where reliability and confidence in prediction are paramount [1]. Unlike traditional NNs, BayNNs can inherently capture and estimate the uncertainty of their predictions, enhancing decision-making under uncertain conditions. However, their implementation faces significant computational bottlenecks, especially on edge devices. Spintronics-based computation-in-memory (Spintronics-CIM) architectures are a promising solution for the hardware realization of BayNNs as they mitigate some of the inherent computational costs, balancing high-performance demands with the constraints of resourcelimited devices.


Using Neural Networks for Novelty-based Test Selection to Accelerate Functional Coverage Closure

Zheng, Xuan, Eder, Kerstin, Blackmore, Tim

arXiv.org Artificial Intelligence

Novel test selectors used in simulation-based verification have been shown to significantly accelerate coverage closure regardless of the number of coverage holes. This paper presents a configurable and highly-automated framework for novel test selection based on neural networks. Three configurations of this framework are tested with a commercial signal processing unit. All three convincingly outperform random test selection with the largest saving of simulation being 49.37% to reach 99.5% coverage. The computational expense of the configurations is negligible compared to the simulation reduction. We compare the experimental results and discuss important characteristics related to the performance of the configurations.


One-Shot Online Testing of Deep Neural Networks Based on Distribution Shift Detection

Ahmed, Soyed Tuhin, Tahoori, Mehdi B.

arXiv.org Artificial Intelligence

Neural networks (NNs) are capable of learning complex patterns and relationships in data to make predictions with high accuracy, making them useful for various tasks. However, NNs are both computation-intensive and memory-intensive methods, making them challenging for edge applications. To accelerate the most common operations (matrix-vector multiplication) in NNs, hardware accelerator architectures such as computation-in-memory (CiM) with non-volatile memristive crossbars are utilized. Although they offer benefits such as power efficiency, parallelism, and nonvolatility, they suffer from various faults and variations, both during manufacturing and lifetime operations. This can lead to faulty computations and, in turn, degradation of post-mapping inference accuracy, which is unacceptable for many applications, including safety-critical applications. Therefore, proper testing of NN hardware accelerators is required. In this paper, we propose a \emph{one-shot} testing approach that can test NNs accelerated on memristive crossbars with only one test vector, making it very suitable for online testing applications. Our approach can consistently achieve $100\%$ fault coverage across several large topologies with up to $201$ layers and challenging tasks like semantic segmentation. Nevertheless, compared to existing methods, the fault coverage is improved by up to $24\%$, the memory overhead is only $0.0123$ MB, a reduction of up to $19980\times$ and the number of test vectors is reduced by $10000\times$.


Multi-criteria Hardware Trojan Detection: A Reinforcement Learning Approach

Sarihi, Amin, Jamieson, Peter, Patooghy, Ahmad, Badawy, Abdel-Hameed A.

arXiv.org Artificial Intelligence

Hardware Trojans (HTs) are undesired design or manufacturing modifications that can severely alter the security and functionality of digital integrated circuits. HTs can be inserted according to various design criteria, e.g., nets switching activity, observability, controllability, etc. However, to our knowledge, most HT detection methods are only based on a single criterion, i.e., nets switching activity. This paper proposes a multi-criteria reinforcement learning (RL) HT detection tool that features a tunable reward function for different HT detection scenarios. The tool allows for exploring existing detection strategies and can adapt new detection scenarios with minimal effort. We also propose a generic methodology for comparing HT detection methods fairly. Our preliminary results show an average of 84.2% successful HT detection in ISCAS-85 benchmark