deserialization
CryptoTensors: A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution
Zhu, Huifeng, Li, Shijie, Li, Qinfeng, Jin, Yier
To enhance the performance of large language models (LLMs) in various domain-specific applications, sensitive data such as healthcare, law, and finance are being used to privately customize or fine-tune these models. Such privately adapted LLMs are regarded as either personal privacy assets or corporate intellectual property. Therefore, protecting model weights and maintaining strict confidentiality during deployment and distribution have become critically important. However, existing model formats and deployment frameworks provide little to no built-in support for confidentiality, access control, or secure integration with trusted hardware. Current methods for securing model deployment either rely on computationally expensive cryptographic techniques or tightly controlled private infrastructure. Although these approaches can be effective in specific scenarios, they are difficult and costly for widespread deployment. In this paper, we introduce CryptoTensors, a secure and format-compatible file structure for confidential LLM distribution. Built as an extension to the widely adopted Safetensors format, CryptoTensors incorporates tensor-level encryption and embedded access control policies, while preserving critical features such as lazy loading and partial deserialization. It enables transparent decryption and automated key management, supporting flexible licensing and secure model execution with minimal overhead. We implement a proof-of-concept library, benchmark its performance across serialization and runtime scenarios, and validate its compatibility with existing inference frameworks, including Hugging Face Transformers and vLLM. Our results highlight CryptoTensors as a light-weight, efficient, and developer-friendly solution for safeguarding LLM weights in real-world and widespread deployments.
- Asia > China (0.04)
- Africa > Cameroon > Gulf of Guinea (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Commercial Services & Supplies > Security & Alarm Services (1.00)
Crypto Miner Attack: GPU Remote Code Execution Attacks
Remote Code Execution (RCE) exploits pose a significant threat to AI and ML systems, particularly in GPU-accelerated environments where the computational power of GPUs can be misused for malicious purposes. This paper focuses on RCE attacks leveraging deserialization vulnerabilities and custom layers, such as TensorFlow Lambda layers, which are often overlooked due to the complexity of monitoring GPU workloads. These vulnerabilities enable attackers to execute arbitrary code, blending malicious activity seamlessly into expected model behavior and exploiting GPUs for unauthorized tasks such as cryptocurrency mining. Unlike traditional CPU-based attacks, the parallel processing nature of GPUs and their high resource utilization make runtime detection exceptionally challenging. In this work, we provide a comprehensive examination of RCE exploits targeting GPUs, demonstrating an attack that utilizes these vulnerabilities to deploy a crypto miner on a GPU. We highlight the technical intricacies of such attacks, emphasize their potential for significant financial and computational costs, and propose strategies for mitigation. By shedding light on this underexplored attack vector, we aim to raise awareness and encourage the adoption of robust security measures in GPU-driven AI and ML systems, with an emphasis on static and model scanning as an easier way to detect exploits.
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (0.93)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Hardware (1.00)
- Information Technology > Graphics (1.00)
- (2 more...)
A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain Attacks in Hugging Face Models
Casey, Beatrice, Santos, Joanna C. S., Mirakhorli, Mehdi
The development of machine learning (ML) techniques has led to ample opportunities for developers to develop and deploy their own models. Hugging Face serves as an open source platform where developers can share and download other models in an effort to make ML development more collaborative. In order for models to be shared, they first need to be serialized. Certain Python serialization methods are considered unsafe, as they are vulnerable to object injection. This paper investigates the pervasiveness of these unsafe serialization methods across Hugging Face, and demonstrates through an exploitation approach, that models using unsafe serialization methods can be exploited and shared, creating an unsafe environment for ML developers. We investigate to what extent Hugging Face is able to flag repositories and files using unsafe serialization methods, and develop a technique to detect malicious models. Our results show that Hugging Face is home to a wide range of potentially vulnerable models.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (2 more...)
Trojan Puzzle attack trains AI assistants into suggesting malicious code
Researchers at the universities of California, Virginia, and Microsoft have devised a new poisoning attack that could trick AI-based coding assistants into suggesting dangerous code. Named'Trojan Puzzle,' the attack stands out for bypassing static detection and signature-based dataset cleansing models, resulting in the AI models being trained to learn how to reproduce dangerous payloads. Given the rise of coding assistants like GitHub's Copilot and OpenAI's ChatGPT, finding a covert way to stealthily plant malicious code in the training set of AI models could have widespread consequences, potentially leading to large-scale supply-chain attacks. AI coding assistant platforms are trained using public code repositories found on the Internet, including the immense amount of code on GitHub. Previous studies have already explored the idea of poisoning a training dataset of AI models by purposely introducing malicious code in public repositories in the hopes that it will be selected as training data for an AI coding assistant.
- North America > United States > Virginia (0.25)
- North America > United States > California (0.25)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.60)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.40)
Performing Real-Time Predictions Using Machine Learning, GridDB and Python
In this tutorial, we will see how we can turn our Machine Learning model into a web API to make real-time predictions using Python. This tutorial is carried out in Anaconda Navigator (Python version – 3.8.3) on Windows Operating System. You can install these packages in Conda's virtual environment using conda install package-name. In case you are using Python directly via terminal/command prompt, pip install package-name will do the work. Note that to access GridDB's database through Python, the following packages will be required – Our environment is all set up and ready to use.
Never a dill moment: Exploiting machine learning pickle files
Many machine learning (ML) models are Python pickle files under the hood, and it makes sense. The use of pickling conserves memory, enables start-and-stop model training, and makes trained models portable (and, thereby, shareable). Pickling is easy to implement, is built into Python without requiring additional dependencies, and supports serialization of custom objects. There's little doubt about why choosing pickling for persistence is a popular practice among Python programmers and ML practitioners. Pre-trained models are typically treated as "free" byproducts of ML since they allow the valuable intellectual property like algorithms and corpora that produced the model to remain private.
Never a dill moment: Exploiting machine learning pickle files - Security Boulevard
Many machine learning (ML) models are Python pickle files under the hood, and it makes sense. The use of pickling conserves memory, enables start-and-stop model training, and makes trained models portable (and, thereby, shareable). Pickling is easy to implement, is built into Python without requiring additional dependencies, and supports serialization of custom objects. There's little doubt about why choosing pickling for persistence is a popular practice among Python programmers and ML practitioners. Pre-trained models are typically treated as "free" byproducts of ML since they allow the valuable intellectual property like algorithms and corpora that produced the model to remain private.
announcing-ray.html?utm_content=buffer01978&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
This post announces Ray, a framework for efficiently running Python code on clusters and large multi-core machines. Like remote functions, actor methods return object IDs (that is, futures) that can be passed into other tasks and whose values can be retrieved with ray.get. The time required for deserialization is particularly important because one of the most common patterns in machine learning is to aggregate a large number of values (for example, neural net weights, rollouts, or other values) in a single process, so the deserialization step could happen hundreds of times in a row. To minimize the time required to deserialize objects in shared memory, we use the Apache Arrow data layout.