Goto

Collaborating Authors

 Asia


8eb88844dafefa92a26aaec9f3acad93-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

Ideally,languagemodelswould reflect the cultural norms of various regions around the world and generate culturally appropriate content when responding inlocallanguages oftheregions, unless otherwise specified.









Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Neural Information Processing Systems

However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods.