Goto

Collaborating Authors

 computing provider


SVIP: Towards Verifiable Inference of Open-source Large Language Models

Sun, Yifan, Li, Yuhang, Zhang, Yue, Jin, Yuchen, Zhang, Huan

arXiv.org Artificial Intelligence

Open-source Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language understanding and generation, leading to widespread adoption across various domains. However, their increasing model sizes render local deployment impractical for individual users, pushing many to rely on computing service providers for inference through a blackbox API. This reliance introduces a new risk: a computing provider may stealthily substitute the requested LLM with a smaller, less capable model without consent from users, thereby delivering inferior outputs while benefiting from cost savings. Existing verifiable computing solutions based on cryptographic or game-theoretic techniques are either computationally uneconomical or rest on strong assumptions. By training a proxy task on these outputs and requiring the computing provider to return both the generated text and the processed intermediate outputs, users can reliably verify whether the computing provider is acting honestly. In addition, the integration of a secret mechanism further enhances the security of our protocol. We thoroughly analyze our protocol under multiple strong and adaptive adversarial scenarios. In recent years, Large Language Models (LLMs) have achieved unprecedented success across a broad array of tasks and domains (Achiam et al., 2023; Dubey et al., 2024; Yang et al., 2024). Alongside this progress, open-source LLMs have proliferated, offering increasingly sophisticated and capable models to the broader research community (Touvron et al., 2023b; Black et al., 2022; Le Scao et al., 2023; Jiang et al., 2023; Almazrouei et al., 2023; Zhang et al., 2023). Many of these open-source LLMs now rival, or even surpass, their closed-source counterparts in performance (Chiang et al., 2023; Almazrouei et al., 2023; Dubey et al., 2024), while remaining freely accessible. However, as model capacity grows, it typically comes with a corresponding increase in the number of model parameters, which directly drives up the computational demands, particularly in terms of memory and processing power (Kukreja et al., 2024).