Archilles' Heel in Semi-open LLMs: Hiding Bottom against Recovery Attacks
Huang, Hanbo, Li, Yihan, Jiang, Bowen, Liu, Lin, Sun, Ruoyu, Liu, Zhuotao, Liang, Shiyu
–arXiv.org Artificial Intelligence
Closed-source large language models deliver strong performance but have limited downstream customizability. Semi-open models, combining both closed-source and public layers, were introduced to improve customizability. However, parameters in the closed-source layers are found vulnerable to recovery attacks. In this paper, we explore the design of semi-open models with fewer closed-source layers, aiming to increase customizability while ensuring resilience to recovery attacks. We analyze the contribution of closed-source layer to the overall resilience and theoretically prove that in a deep transformer-based model, there exists a transition layer such that even small recovery errors in layers before this layer can lead to recovery failure. SCARA employs a fine-tuning-free metric to estimate the maximum number of layers that can be publicly accessible for customization. We apply it to five models (1.3B to 70B parameters) to construct semi-open models, validating their customizability on six downstream tasks and assessing their resilience against various recovery attacks on sixteen benchmarks. We compare SCARA to baselines and observe that it generally improves downstream customization performance and offers similar resilience with over 10 times fewer closed-source parameters. We empirically investigate the existence of transition layers, analyze the effectiveness of our scheme and finally discuss its limitations. Open-sourcing more parameters and structure details apparently enhances downstream customizability. However, Zanella-Beguelin et al. (2021) showed that semi-open LLMs with only a few closed-source parameters are vulnerable to model recovery attacks. Recovery attackers query the closed-source module and then train a new module that imitates its functionality. This can lead to the full replication and theft of closed-source modules (Solaiman, 2023). Recovery attackers targeting fully closed-source models seek to fine-tune a new model that precisely replicates the closed-source model (Tamber et al., 2024; Dubiński et al., 2024). In contrast, attackers in semi-open settings are not required to exactly replicate the closed-source module. Instead, they can fine-tune the closed-source module alongside the public module to reconstruct the overall functionality. While open-sourcing more layers enhances downstream flexibility, it also facilitates easier replication.
arXiv.org Artificial Intelligence
Oct-14-2024
- Country:
- North America > United States (0.67)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government > Military (0.67)
- Information Technology > Security & Privacy (1.00)
- Law (0.67)
- Technology: