vTune: Verifiable Fine-Tuning for LLMs Through Backdooring
Zhang, Eva, Pal, Arka, Potti, Akilesh, Goldblum, Micah
–arXiv.org Artificial Intelligence
As fine-tuning large language models (LLMs) becomes increasingly prevalent, users often rely on third-party services with limited visibility into their fine-tuning processes. This lack of transparency raises the question: how do consumers verify that fine-tuning services are performed correctly? For instance, a service provider could claim to fine-tune a model for each user, yet simply send all users back the same base model. To address this issue, we propose vTune, a simple method that uses a small number of backdoor data points added to the training data to provide a statistical test for verifying that a provider fine-tuned a custom model on a particular user's dataset. Unlike existing works, vTune is able to scale to verification of fine-tuning on state-of-the-art LLMs, and can be used both with open-source and closed-source models. We test our approach across several model families and sizes as well as across multiple instruction-tuning datasets, and find that the statistical test is satisfied with p-values on the order of $\sim 10^{-40}$, with no negative impact on downstream task performance. Further, we explore several attacks that attempt to subvert vTune and demonstrate the method's robustness to these attacks.
arXiv.org Artificial Intelligence
Nov-11-2024
- Country:
- Europe (0.14)
- North America > United States
- Texas (0.14)
- Genre:
- Research Report
- Experimental Study (0.48)
- New Finding (0.46)
- Research Report
- Industry:
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (1.00)
- Technology: