Goto

Collaborating Authors

 Tan, Benjamin


ALMOST: Adversarial Learning to Mitigate Oracle-less ML Attacks via Synthesis Tuning

arXiv.org Artificial Intelligence

Oracle-less machine learning (ML) attacks have broken various logic locking schemes. Regular synthesis, which is tailored for area-power-delay optimization, yields netlists where key-gate localities are vulnerable to learning. Thus, we call for security-aware logic synthesis. We propose ALMOST, a framework for adversarial learning to mitigate oracle-less ML attacks via synthesis tuning. ALMOST uses a simulated-annealing-based synthesis recipe generator, employing adversarially trained models that can predict state-of-the-art attacks' accuracies over wide ranges of recipes and key-gate localities. Experiments on ISCAS benchmarks confirm the attacks' accuracies drops to around 50\% for ALMOST-synthesized circuits, all while not undermining design optimization.


Benchmarking Large Language Models for Automated Verilog RTL Code Generation

arXiv.org Artificial Intelligence

Automating hardware design could obviate a significant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating Verilog code is a critical first step. Emerging large language models (LLMs) are able to write high-quality code in other programming languages. In this paper, we characterize the ability of LLMs to generate useful Verilog. For this, we fine-tune pre-trained LLMs on Verilog datasets collected from GitHub and Verilog textbooks. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code generated in response to problems of varying difficulty. Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code (25.9% overall). Further, when analyzing functional correctness, a fine-tuned open-source CodeGen LLM can outperform the state-of-the-art commercial Codex LLM (6.5% overall). Training/evaluation scripts and LLM checkpoints are available: https://github.com/shailja-thakur/VGen.


Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs?

arXiv.org Artificial Intelligence

Human developers can produce code with cybersecurity weaknesses. Can emerging 'smart' code completion tools help repair those weaknesses? In this work, we examine the use of large language models (LLMs) for code (such as OpenAI's Codex and AI21's Jurassic J-1) for zero-shot vulnerability repair. We investigate challenges in the design of prompts that coax LLMs into generating repaired versions of insecure code. This is difficult due to the numerous ways to phrase key information -- both semantically and syntactically -- with natural languages. By performing a large scale study of four commercially available, black-box, "off-the-shelf" LLMs, as well as a locally-trained model, on a mix of synthetic, hand-crafted, and real-world security bug scenarios, our experiments show that LLMs could collectively repair 100% of our synthetically generated and hand-crafted scenarios, as well as 58% of vulnerabilities in a selection of historical bugs in real-world open-source projects.


OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

arXiv.org Artificial Intelligence

Logic synthesis is a challenging and widely-researched combinatorial optimization problem during integrated circuit (IC) design. It transforms a high-level description of hardware in a programming language like Verilog into an optimized digital circuit netlist, a network of interconnected Boolean logic gates, that implements the function. Spurred by the success of ML in solving combinatorial and graph problems in other domains, there is growing interest in the design of ML-guided logic synthesis tools. Yet, there are no standard datasets or prototypical learning tasks defined for this problem domain. Here, we describe OpenABC-D,a large-scale, labeled dataset produced by synthesizing open source designs with a leading open-source logic synthesis tool and illustrate its use in developing, evaluating and benchmarking ML-guided logic synthesis. OpenABC-D has intermediate and final outputs in the form of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs plus labels such as the optimized node counts, and de-lay. We define a generic learning problem on this dataset and benchmark existing solutions for it. The codes related to dataset creation and benchmark models are available athttps://github.com/NYU-MLDA/OpenABC.git. The dataset generated is available athttps://archive.nyu.edu/handle/2451/63311


An Empirical Cybersecurity Evaluation of GitHub Copilot's Code Contributions

arXiv.org Artificial Intelligence

There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer', GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot's code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those from MITRE's "Top 25" list). We explore Copilot's performance on three distinct code generation axes -- examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,692 programs. Of these, we found approximately 40% to be vulnerable.


Subverting Privacy-Preserving GANs: Hiding Secrets in Sanitized Images

arXiv.org Artificial Intelligence

Unprecedented data collection and sharing have exacerbated privacy concerns and led to increasing interest in privacy-preserving tools that remove sensitive attributes from images while maintaining useful information for other tasks. Currently, state-of-the-art approaches use privacy-preserving generative adversarial networks (PP-GANs) for this purpose, for instance, to enable reliable facial expression recognition without leaking users' identity. However, PP-GANs do not offer formal proofs of privacy and instead rely on experimentally measuring information leakage using classification accuracy on the sensitive attributes of deep learning (DL)-based discriminators. In this work, we question the rigor of such checks by subverting existing privacy-preserving GANs for facial expression recognition. We show that it is possible to hide the sensitive identification data in the sanitized output images of such PP-GANs for later extraction, which can even allow for reconstruction of the entire input images, while satisfying privacy checks. We demonstrate our approach via a PP-GAN-based architecture and provide qualitative and quantitative evaluations using two public datasets. Our experimental results raise fundamental questions about the need for more rigorous privacy checks of PP-GANs, and we provide insights into the social impact of these.


DAVE: Deriving Automatically Verilog from English

arXiv.org Machine Learning

While specifications for digital systems are provided in natural language, engineers undertake significant efforts to translate them into the programming languages understood by compilers for digital systems. Automating this process allows designers to work with the language in which they are most comfortable --the original natural language -- and focus instead on other downstream design challenges. We explore the use of state-of-the-art machine learning (ML) to automatically derive Verilog snippets from English via fine-tuning GPT-2, a natural language ML system. We describe our approach for producing a suitable dataset of novice-level digital design tasks and provide a detailed exploration of GPT-2, finding encouraging translation performance across our task sets (94.8% correct), with the ability to handle both simple and abstract design tasks.


Are Adversarial Perturbations a Showstopper for ML-Based CAD? A Case Study on CNN-Based Lithographic Hotspot Detection

arXiv.org Machine Learning

There is substantial interest in the use of machine learning (ML) based techniques throughout the electronic computer-aided design (CAD) flow, particularly those based on deep learning. However, while deep learning methods have surpassed state-of-the-art performance in several applications, they have exhibited intrinsic susceptibility to adversarial perturbations --- small but deliberate alterations to the input of a neural network, precipitating incorrect predictions. In this paper, we seek to investigate whether adversarial perturbations pose risks to ML-based CAD tools, and if so, how these risks can be mitigated. To this end, we use a motivating case study of lithographic hotspot detection, for which convolutional neural networks (CNN) have shown great promise. In this context, we show the first adversarial perturbation attacks on state-of-the-art CNN-based hotspot detectors; specifically, we show that small (on average 0.5% modified area), functionality preserving and design-constraint satisfying changes to a layout can nonetheless trick a CNN-based hotspot detector into predicting the modified layout as hotspot free (with up to 99.7% success). We propose an adversarial retraining strategy to improve the robustness of CNN-based hotspot detection and show that this strategy significantly improves robustness (by a factor of ~3) against adversarial attacks without compromising classification accuracy.