Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
–Neural Information Processing Systems
Large language models (LLMs) are trained on a deluge of text data with limited quality control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as leaking information, fake news or hate speech.
Neural Information Processing Systems
Oct-10-2025, 00:44:44 GMT
- Country:
- North America > United States (0.27)
- Asia > Middle East
- Jordan (0.04)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Information Technology > Security & Privacy (1.00)
- Government (0.68)
- Media (0.66)
- Technology: