NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation
Romijnders, Rob, Laskaridis, Stefanos, Shamsabadi, Ali Shahin, Haddadi, Hamed
–arXiv.org Artificial Intelligence
Large Language Models (LLM) are typically trained on vast amounts of data from various sources. Even when designed modularly (e.g., Mixture-of-Experts), LLMs can leak privacy on their sources. Conversely, training such models in isolation arguably prohibits generalization. Large Language Models have brought much disruption in the field of Artificial Intelligence and have transformed various use-cases, from intelligent assistants (Dong et al., 2023) and code copilots (Chen et al., 2021) to agentic web browsing (Zheng et al., 2024) and enhanced tutoring (Ko-talwar et al., 2024). They have shown great scaling potential, devouring terabytes of raw textual or multi-modal data (Kaplan et al., 2020) without their performance plateauing. As this trend continues, all public resources will eventually be consumed. Therefore, tapping into private data silos will become the next significant source of information (Shumailov et al., 2024; Iacob et al., 2024). This introduces the need to orchestrate model training that is somehow separated per region or source. Maintaining separate models, though, quickly becomes intractable and burdensome. Private organizations can own data they want to use for their custom LLM but not expose it publicly Carlini et al. (2021); OpenAI (2023). For instance, client institutions may wish to train domain-specific Copilots (GitHub, 2024) without leaking proprietary information (Niu et al., 2023) to the public domain. To approach this problem, we draw from Modular Learning (Pfeiffer et al., 2023) for routing knowledge across parts of a neural network and adaptively serve to different domains. While off-the-shelf Mixture-of-Experts (MoE) models (Cai et al., 2024) adopt an architecture where different domains can share common parameters - thus enabling knowledge transfer. However, they can introduce privacy risks (Carlini et al., 2019) exactly because of this sharing. In addition, training an entire MoE model under Differential Privacy (DP) significantly reduces its utility as training a large shared backbone network over multiple domains requires adding large amounts of DP noise.
arXiv.org Artificial Intelligence
Apr-28-2025
- Country:
- North America > United States > Minnesota (0.28)
- Genre:
- Research Report (0.82)
- Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Technology: