UK's AI Safety Institute easily jailbreaks major LLMs

Engadget 

In a shocking turn of events, AI systems might not be as safe as their creators make them out to be -- who saw that coming, right? In a new report, the UK government's AI Safety Institute (AISI) found that the four undisclosed LLMs tested were "highly vulnerable to basic jailbreaks." Some unjailbroken models even generated "harmful outputs" without researchers attempting to produce them. Most publicly available LLMs have certain safeguards built in to prevent them from generating harmful or illegal responses; jailbreaking simply means tricking the model into ignoring those safeguards. AISI did this using prompts from a recent standardized evaluation framework as well as prompts it developed in-house.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found