Africa
Understanding Model Selection for Learning in Strategic Environments
The deployment of ever-larger machine learning models reflects a growing consensus that the more expressive the model class one optimizes over--and the more data one has access to--the more one can improve performance. As models get deployed in a variety of real-world scenarios, they inevitably face strategic environments.
d2b752ed4726286a4b488ae16e091d64-Supplemental-Conference.pdf
Table 3 presents comprehensive details of the TrojAI dataset. PICCOLO is a backdoor scanning tool aiming at detecting whether a language model is backdoored. It cannot reverse engineer exact triggers but optimizes a list of surrogate triggers that can induce ASR. The surrogate triggers by PICCOLO cannot be directly used. Table 4 documents the optimal prompts identified via fuzzing for each model.