Goto

Collaborating Authors

 pickle


When Secure Isn't: Assessing the Security of Machine Learning Model Sharing

Digregorio, Gabriele, Di Gennaro, Marco, Zanero, Stefano, Longari, Stefano, Carminati, Michele

arXiv.org Artificial Intelligence

The rise of model-sharing through frameworks and dedicated hubs makes Machine Learning significantly more accessible. Despite their benefits, these tools expose users to underexplored security risks, while security awareness remains limited among both practitioners and developers. To enable a more security-conscious culture in Machine Learning model sharing, in this paper we evaluate the security posture of frameworks and hubs, assess whether security-oriented mechanisms offer real protection, and survey how users perceive the security narratives surrounding model sharing. Our evaluation shows that most frameworks and hubs address security risks partially at best, often by shifting responsibility to the user. More concerningly, our analysis of frameworks advertising security-oriented settings and complete model sharing uncovered six 0-day vulnerabilities enabling arbitrary code execution. Through this analysis, we debunk the misconceptions that the model-sharing problem is largely solved and that its security can be guaranteed by the file format used for sharing. As expected, our survey shows that the surrounding security narrative leads users to consider security-oriented settings as trustworthy, despite the weaknesses shown in this work. From this, we derive takeaways and suggestions to strengthen the security of model-sharing ecosystems.


Zero-Trust Artificial Intelligence Model Security Based on Moving Target Defense and Content Disarm and Reconstruction

Gilkarov, Daniel, Dubin, Ran

arXiv.org Artificial Intelligence

--This paper examines the challenges in distributing AI models through model zoos and file transfer mechanisms. Despite advancements in security measures, vulnerabilities persist, necessitating a multi-layered approach to mitigate risks effectively. The physical security of model files is critical, requiring stringent access controls and attack prevention solutions. This paper proposes a novel solution architecture composed of two prevention approaches. The first is Content Disarm and Reconstruction (CDR), which focuses on disarming serialization attacks that enable attackers to run malicious code as soon as the model is loaded. The second is protecting the model architecture and weights from attacks by using Moving T arget Defense (MTD), alerting the model structure, and providing verification steps to detect such attacks. The paper focuses on the highly exploitable Pickle and PyT orch file formats. It demonstrates a 100% disarm rate while validated against known AI model repositories and actual malware attacks from the HuggingFace model zoo. The swift evolution of Artificial Intelligence (AI) technology has made it a top priority for cybercriminals looking to obtain confidential information and intellectual property. These malicious individuals may try to exploit AI systems for their own gain, using specialized tactics alongside conventional IT methods. Given the broad spectrum of potential attack strategies, safeguards must be extensive. Experienced attackers frequently employ a combination of techniques to execute more intricate operations, which can render layered defenses ineffective. While adversarial AI model security [1, 2], privacy [3] and operational security aspects of AI receive much attention [4, 5], it's equally important to address the physical file security aspects of AI models.


Crypto Miner Attack: GPU Remote Code Execution Attacks

Szabo, Ariel, Hadad, Uzy

arXiv.org Artificial Intelligence

Remote Code Execution (RCE) exploits pose a significant threat to AI and ML systems, particularly in GPU-accelerated environments where the computational power of GPUs can be misused for malicious purposes. This paper focuses on RCE attacks leveraging deserialization vulnerabilities and custom layers, such as TensorFlow Lambda layers, which are often overlooked due to the complexity of monitoring GPU workloads. These vulnerabilities enable attackers to execute arbitrary code, blending malicious activity seamlessly into expected model behavior and exploiting GPUs for unauthorized tasks such as cryptocurrency mining. Unlike traditional CPU-based attacks, the parallel processing nature of GPUs and their high resource utilization make runtime detection exceptionally challenging. In this work, we provide a comprehensive examination of RCE exploits targeting GPUs, demonstrating an attack that utilizes these vulnerabilities to deploy a crypto miner on a GPU. We highlight the technical intricacies of such attacks, emphasize their potential for significant financial and computational costs, and propose strategies for mitigation. By shedding light on this underexplored attack vector, we aim to raise awareness and encourage the adoption of robust security measures in GPU-driven AI and ML systems, with an emphasis on static and model scanning as an easier way to detect exploits.


How Do Model Export Formats Impact the Development of ML-Enabled Systems? A Case Study on Model Integration

Parida, Shreyas Kumar, Gerostathopoulos, Ilias, Bogner, Justus

arXiv.org Artificial Intelligence

Machine learning (ML) models are often integrated into ML-enabled systems to provide software functionality that would otherwise be impossible. This integration requires the selection of an appropriate ML model export format, for which many options are available. These formats are crucial for ensuring a seamless integration, and choosing a suboptimal one can negatively impact system development. However, little evidence is available to guide practitioners during the export format selection. We therefore evaluated various model export formats regarding their impact on the development of ML-enabled systems from an integration perspective. Based on the results of a preliminary questionnaire survey (n=17), we designed an extensive embedded case study with two ML-enabled systems in three versions with different technologies. We then analyzed the effect of five popular export formats, namely ONNX, Pickle, TensorFlow's SavedModel, PyTorch's TorchScript, and Joblib. In total, we studied 30 units of analysis (2 systems x 3 tech stacks x 5 formats) and collected data via structured field notes. The holistic qualitative analysis of the results indicated that ONNX offered the most efficient integration and portability across most cases. SavedModel and TorchScript were very convenient to use in Python-based systems, but otherwise required workarounds (TorchScript more than SavedModel). SavedModel also allowed the easy incorporation of preprocessing logic into a single file, which made it scalable for complex deep learning use cases. Pickle and Joblib were the most challenging to integrate, even in Python-based systems. Regarding technical support, all model export formats had strong technical documentation and strong community support across platforms such as Stack Overflow and Reddit. Practitioners can use our findings to inform the selection of ML export formats suited to their context.


A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain Attacks in Hugging Face Models

Casey, Beatrice, Santos, Joanna C. S., Mirakhorli, Mehdi

arXiv.org Artificial Intelligence

The development of machine learning (ML) techniques has led to ample opportunities for developers to develop and deploy their own models. Hugging Face serves as an open source platform where developers can share and download other models in an effort to make ML development more collaborative. In order for models to be shared, they first need to be serialized. Certain Python serialization methods are considered unsafe, as they are vulnerable to object injection. This paper investigates the pervasiveness of these unsafe serialization methods across Hugging Face, and demonstrates through an exploitation approach, that models using unsafe serialization methods can be exploited and shared, creating an unsafe environment for ML developers. We investigate to what extent Hugging Face is able to flag repositories and files using unsafe serialization methods, and develop a technique to detect malicious models. Our results show that Hugging Face is home to a wide range of potentially vulnerable models.


Travel firms urged to embrace Artificial...

#artificialintelligence

Technology experts have encouraged travel firms to embrace rapidly-evolving artificial intelligence technology – and are confident the rise of services such as ChatGPT does not signal the end of the traditional agent. ChatGPT, an "AI-powered chatbot" capable of giving complex human-like answers to questions asked in "native language", has seen a meteoric rise in users prompted by Microsoft integrating it into its Bing search engine. Some analysts believe the tool, along with rivals like Bard which was unveiled by Google last week, could cause major disruption to sectors including service industries such as travel. However, travel technology specialists likened the concerns to those voiced in the early days of the internet, when it was claimed the emergent technology would negate the need for human interaction. Jon Pickles, chief revenue officer of Inspiretec, said: "ChatGPT only knows what it knows. Its ability to learn fast and assimilate vast quantities of information is something not to be afraid of. "We should perhaps even consider it an opportunity.


Exporting NIR regression models built in Python

#artificialintelligence

Hi everyone, and thanks for tuning in to our new post on exporting NIR regression models built in Python. One of the reader of this blog asked me this question: "How can we export a model that we just build, so that we can use it over and over again without having to fit the training data every time?" I must admit, I didn't have the answer straight away, it's a very good question. Once the training part is completed, it would be good to export the model to file, store it and retrieve it at a later time. If you'd like to get started with building your calibration models in Python, take a look at some of our previous posts.


5 Different Ways To Save Your Machine Learning Model

#artificialintelligence

Saving your trained machine learning models is an important step in the machine learning workflow: it permits you to reuse them in the future. For instance, it's highly likely you'll have to compare models to determine the champion model to take into production -- saving the models when they are trained makes this process easier. The alternative would be to train the model each time it needs to be used, which can significantly affect productivity, especially if the model takes a long time to train. In this post, we will cover 5 different ways you can save your trained models. Pickle is one of the most popular ways to serialize objects in Python; You can use Pickle to serialize your trained machine learning model and save it to a file. At a later time or in another script, you can deserialize the file to access the trained model and use it to make predictions.


What to do After Deploying your Model to Production? - Analytics Vidhya

#artificialintelligence

When the standard error of mean drops the red threshold we have determined, an alert would be sent, which would require us to look at the model performance and take necessary action like retraining. Retraining can be done in two different methods, either manual retraining or automatic retraining; manual retraining is far more common, as most teams are apprehensive about retraining their models without human interference. Next, we would look at a deployment done by me in Heroku using flask and python. I worked on a case study project, to provide a demo of the same, I deployed the machine learning model as a web application. The case study was to predict the abuse category based on the description provided by the victim.


Top 10 Robotics Funding in April 2021

#artificialintelligence

Robotics, ever since its advent, has only been on the trajectory of improvement and advancement, raising its demand amongst traders. Robotics and automation in machines are now the trendsetters as they mimic human intelligence. The business domain goes gaga over the innovations that robotics brings along and this is reason why robotics has high value when talking of artificial intelligence-driven approaches. Companies and business organisations are now more passionate and enthusiastic to invest their capital in robotics. Here are 10 robotics companies that have raised funding and investment in April.