Goto

Collaborating Authors

 Europe


d2b752ed4726286a4b488ae16e091d64-Supplemental-Conference.pdf

Neural Information Processing Systems

Table 3 presents comprehensive details of the TrojAI dataset. PICCOLO is a backdoor scanning tool aiming at detecting whether a language model is backdoored. It cannot reverse engineer exact triggers but optimizes a list of surrogate triggers that can induce ASR. The surrogate triggers by PICCOLO cannot be directly used. Table 4 documents the optimal prompts identified via fuzzing for each model.



Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models Guillermo Ortiz-Jimenez

Neural Information Processing Systems

We present a comprehensive study of task arithmetic in vision-language models and show that weight disentanglement is the crucial factor that makes it effective. This property arises during pre-training and manifests when distinct directions in weight space govern separate, localized regions in function space associated with the tasks.