negl
263c763d00c6126d37ba670a1fa10847-Paper-Conference.pdf
When such backdoors exist, they allow the designer of the model to sell information on how to slightly perturb their input to change the outcome of the model. We develop a general strategy to plant backdoors to obfuscated neural networks, that satisfy the security properties of the celebrated notion of indistinguishability obfuscation. Applying obfuscation before releasing neural networks is a strategy that is well motivated to protect sensitive information of the external expert firm.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California (0.04)
- Europe > Germany (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany > Berlin (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (4 more...)
A Supplementary material
A.1 Learning Our proof also relies on the following theorem from Bubeck et al. [2019]. The following facts hold about the Reed-Solomon code. We summarize this result as the following theorem. Theorem 25 (Inner product is a good randomness extractor) . Next, we need the following notion and results from Fourier analysis.
Multi-Designated Detector Watermarking for Language Models
Huang, Zhengan, Zeng, Gongxian, Mu, Xin, Wang, Yu, Yu, Yue
In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs). This technique allows model providers to generate watermarked outputs from LLMs with two key properties: (i) only specific, possibly multiple, designated detectors can identify the watermarks, and (ii) there is no perceptible degradation in the output quality for ordinary users. We formalize the security definitions for MDDW and present a framework for constructing MDDW for any LLM using multi-designated verifier signatures (MDVS). Recognizing the significant economic value of LLM outputs, we introduce claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings. To support claimable MDDW, we propose a generic transformation converting any MDVS to a claimable MDVS. Our implementation of the MDDW scheme highlights its advanced functionalities and flexibility over existing methods, with satisfactory performance metrics.
- North America > United States (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Edit Distance Robust Watermarks for Language Models
Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees. We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text. Earlier schemes could only handle stochastic substitutions and deletions, and thus we are aiming for a more natural and appealing robustness guarantee that holds with respect to edit distance. Our main result is a watermarking scheme which achieves both undetectability and robustness to edits when the alphabet size for the language model is allowed to grow as a polynomial in the security parameter. To derive such a scheme, we follow an approach introduced by Christ & Gunn (2024), which proceeds via first constructing pseudorandom codes satisfying undetectability and robustness properties analogous to those above; our key idea is to handle adversarial insertions and deletions by interpreting the symbols as indices into the codeword, which we call indexing pseudorandom codes. Additionally, our codes rely on weaker computational assumptions than used in previous work. Then we show that there is a generic transformation from such codes over large alphabets to watermarking schemes for arbitrary language models.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
Pseudorandom Error-Correcting Codes
We construct pseudorandom error-correcting codes (or simply pseudorandom codes), which are error-correcting codes with the property that any polynomial number of codewords are pseudorandom to any computationally-bounded adversary. Efficient decoding of corrupted codewords is possible with the help of a decoding key. We build pseudorandom codes that are robust to substitution and deletion errors, where pseudorandomness rests on standard cryptographic assumptions. Specifically, pseudorandomness is based on either $2^{O(\sqrt{n})}$-hardness of LPN, or polynomial hardness of LPN and the planted XOR problem at low density. As our primary application of pseudorandom codes, we present an undetectable watermarking scheme for outputs of language models that is robust to cropping and a constant rate of random substitutions and deletions. The watermark is undetectable in the sense that any number of samples of watermarked text are computationally indistinguishable from text output by the original model. This is the first undetectable watermarking scheme that can tolerate a constant rate of errors. Our second application is to steganography, where a secret message is hidden in innocent-looking content. We present a constant-rate stateless steganography scheme with robustness to a constant rate of substitutions. Ours is the first stateless steganography scheme with provable steganographic security and any robustness to errors.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- (5 more...)