Goto

Collaborating Authors

 depersonalisation


The TIP of the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks on LLMs

arXiv.org Artificial Intelligence

We present a novel class of jailbreak adversarial attacks on LLMs, termed Task-in-Prompt (TIP) attacks. Our approach embeds sequence-to-sequence tasks (e.g., cipher decoding, riddles, code execution) into the model's prompt to indirectly generate prohibited inputs. To systematically assess the effectiveness of these attacks, we introduce the PHRYGE benchmark. We demonstrate that our techniques successfully circumvent safeguards in six state-of-the-art language models, including GPT-4o and LLaMA 3.2. Our findings highlight critical weaknesses in current LLM safety alignments and underscore the urgent need for more sophisticated defence strategies. Warning: this paper contains examples of unethical inquiries used solely for research purposes.


Be vigilant

#artificialintelligence

Sophia, the worlds most advanced humanoid released to date was granted an honorary citizenship a few months ago by Saudi Arabia. In a move that set the net flooding with awe and dismay, this act probably triggered the first step towards recognising artificial intelligence being in the room and not at door step. The UN joined to recognise Sophia as the world's first UN Innovation Champion by UNDP. While these moves were music to many, artificial intelligence is raising a lot of divided opinions across the best of brains in science and technology. A quote widely in circulation on the social media on Einstein's premonition of a world having a generation of idiots may have its fair share of laughs. Einstein had indeed written a letter to his friend, psychiatrist Otto Juliusburger, in 1948 where he believed that the abominable deterioration of ethical standards stemmed primarily from the mechanisation and depersonalisation of our lives, a disastrous byproduct of science and technology.