Goto

Collaborating Authors

 hogwart


Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models

Sinha, Yash, Baser, Manit, Mandal, Murari, Divakaran, Dinil Mon, Kankanhalli, Mohan

arXiv.org Artificial Intelligence

Knowledge erasure in large language models (LLMs) is important for ensuring compliance with data and AI regulations, safeguarding user privacy, mitigating bias, and misinformation. Existing unlearning methods aim to make the process of knowledge erasure more efficient and effective by removing specific knowledge while preserving overall model performance, especially for retained information. However, it has been observed that the unlearning techniques tend to suppress and leave the knowledge beneath the surface, thus making it retrievable with the right prompts. In this work, we demonstrate that \textit{step-by-step reasoning} can serve as a backdoor to recover this hidden information. We introduce a step-by-step reasoning-based black-box attack, Sleek, that systematically exposes unlearning failures. We employ a structured attack framework with three core components: (1) an adversarial prompt generation strategy leveraging step-by-step reasoning built from LLM-generated queries, (2) an attack mechanism that successfully recalls erased content, and exposes unfair suppression of knowledge intended for retention and (3) a categorization of prompts as direct, indirect, and implied, to identify which query types most effectively exploit unlearning weaknesses. Through extensive evaluations on four state-of-the-art unlearning techniques and two widely used LLMs, we show that existing approaches fail to ensure reliable knowledge removal. Of the generated adversarial prompts, 62.5% successfully retrieved forgotten Harry Potter facts from WHP-unlearned Llama, while 50% exposed unfair suppression of retained knowledge. Our work highlights the persistent risks of information leakage, emphasizing the need for more robust unlearning strategies for erasure.


TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

Ahn, Jaewoo, Lee, Taehyun, Lim, Junyoung, Kim, Jin-Hwa, Yun, Sangdoo, Lee, Hwaran, Kim, Gunhee

arXiv.org Artificial Intelligence

While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurately represent characters at specific time points, agents must avoid character hallucination, where they display knowledge that contradicts their characters' identities and historical timelines. We introduce TimeChara, a new benchmark designed to evaluate point-in-time character hallucination in role-playing LLMs. Comprising 10,895 instances generated through an automated pipeline, this benchmark reveals significant hallucination issues in current state-of-the-art LLMs (e.g., GPT-4o). To counter this challenge, we propose Narrative-Experts, a method that decomposes the reasoning steps and utilizes narrative experts to reduce point-in-time character hallucinations effectively. Still, our findings with TimeChara highlight the ongoing challenges of point-in-time character hallucination, calling for further study.


Text Summarization on the Books of Harry Potter

#artificialintelligence

"Aren't you two ever going to read Hogwarts, A History?" How many times throughout the Harry Potter series does Hermione bug Harry and Ron to read the enormous tome Hogwarts, A History? Hint: it's a lot. How many nights do the three of them spend in the library, reading through every book they can find to figure out who Nicolas Flamel is, or how to survive underwater, or preparing for their O.W.L.s? The mistake they're all making is to try to read everything themselves. Remember when you were in school and stumbled upon the CliffsNotes summary of that book you never read but were supposed to write an essay about? That's basically what text summarization does: provide the CliffsNotes version for any large document.


Happy Potter video game footage leaks online

The Independent - Tech

A major new Harry Potter video game may be in development, after leaked footage appeared online. The 72 seconds of footage shows players casting spells, cavorting with magical beasts, fighting goblins and exploring what appears to be Hogwarts. Gaming journalists took to social media to speculate about the authenticity of the leak, with some suggesting that a big-budget Harry Potter game is long overdue. The leaked video was posted to YouTube by someone using the name RastaPasta, together with a description of the game. "Set in the 19th Century (1800's) Wizarding World, this 3rd person open-world action RPG game centers around your character with unique abilities who has eared a late acceptance to Hogwarts School of Witchcraft and Wizardry. You are a newly arrived 5th year student to Hogwarts that demonstrates a latent gift for magic with a unique ability to track and identify remnants of a pottant ancient power," it states.