Mission Impossible: A Statistical Perspective on Jailbreaking LLMs

Neural Information Processing Systems 

Large language models (LLMs) are trained on a deluge of text data with limited quality control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as leaking information, fake news or hate speech.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found