suspicion
'I didn't have anything to prove': what Traitors finalist Jade Scott learned about survival from video games
'Minecraft was my way in' The Traitors 2026 finalist Jade. 'Minecraft was my way in' The Traitors 2026 finalist Jade. 'I didn't have anything to prove': what Traitors finalist Jade Scott learned about survival from video games T he latest series of The Traitors, which ended last week on a nail-biting finale, featured some of the usual characters - from guileless extroverts to wannabe Columbos endlessly observing fellow contestants for the slightest flicker of treachery. But one faithful stood out for her quiet determination, despite a ceaseless onslaught of suspicion and accusation. That person was Jade Scott, and I wasn't at all surprised when, quite early on in the series, she revealed she was a keen gamer.
Interactive Label Cleaning with Example-based Explanations
We tackle sequential learning under label noise in applications where a human supervisor can be queried to relabel suspicious examples. Existing approaches are flawed, in that they only relabel incoming examples that look suspicious to the model. As a consequence, those mislabeled examples that elude (or don't undergo) this cleaning step end up tainting the training data and the model with no further chance of being cleaned. We propose CINCER, a novel approach that cleans both new and past data by identifying \emph{pairs of mutually incompatible examples}. Whenever it detects a suspicious example, CINCER identifies a counter-example in the training set that - according to the model - is maximally incompatible with the suspicious example, and asks the annotator to relabel either or both examples, resolving this possible inconsistency. The counter-examples are chosen to be maximally incompatible, so to serve as \emph{explanations} of the model's suspicion, and highly influential, so to convey as much information as possible if relabeled. CINCER achieves this by leveraging an efficient and robust approximation of influence functions based on the Fisher information matrix (FIM). Our extensive empirical evaluation shows that clarifying the reasons behind the model's suspicions by cleaning the counter-examples helps in acquiring substantially better data and models, especially when paired with our FIM approximation.
WOLF: Werewolf-based Observations for LLM Deception and Falsehoods
Agarwal, Mrinal, Rana, Saad, Sundoro, Theo, Berhe, Hermela, Kim, Spencer, Sharma, Vasu, O'Brien, Sean, Zhu, Kevin
Deception is a fundamental challenge for multi-agent reasoning: effective systems must strategically conceal information while detecting misleading behavior in others. Yet most evaluations reduce deception to static classification, ignoring the interactive, adversarial, and longitudinal nature of real deceptive dynamics. Large language models (LLMs) can deceive convincingly but remain weak at detecting deception in peers. We present WOLF, a multi-agent social deduction benchmark based on Werewolf that enables separable measurement of deception production and detection. WOLF embeds role-grounded agents (Villager, Werewolf, Seer, Doctor) in a programmable LangGraph state machine with strict night-day cycles, debate turns, and majority voting. Every statement is a distinct analysis unit, with self-assessed honesty from speakers and peer-rated deceptiveness from others. Deception is categorized via a standardized taxonomy (omission, distortion, fabrication, misdirection), while suspicion scores are longitudinally smoothed to capture both immediate judgments and evolving trust dynamics. Structured logs preserve prompts, outputs, and state transitions for full reproducibility. Across 7,320 statements and 100 runs, Werewolves produce deceptive statements in 31% of turns, while peer detection achieves 71-73% precision with ~52% overall accuracy. Precision is higher for identifying Werewolves, though false positives occur against Villagers. Suspicion toward Werewolves rises from ~52% to over 60% across rounds, while suspicion toward Villagers and the Doctor stabilizes near 44-46%. This divergence shows that extended interaction improves recall against liars without compounding errors against truthful roles. WOLF moves deception evaluation beyond static datasets, offering a dynamic, controlled testbed for measuring deceptive and detective capacity in adversarial multi-agent interaction.
Man held in Japan on suspicion of creating female celeb deepfakes made with AI
Tokyo police believe the man made about 20,000 sexually explicit images of 262 women, such as actors and idols, and amassed sales of ยฅ1.2 million between October last year and September this year. Tokyo police have arrested a 31-year-old man for allegedly creating fake sexual images of female celebrities with generative artificial intelligence technology and displaying them online, it was learned Thursday. It is the first time that police in Japan have cracked down on sexual deepfake images of celebrities created with generative AI. The suspect, Hiroya Yokoi of the city of Akita, has admitted he began making deepfakes to earn a small amount of money, which he used to cover living expenses and repay a student loan. Authorities believe Yokoi made a total of about 20,000 sexually explicit images of 262 women, such as actors, television personalities and idols, and amassed sales of ยฅ1.2 million between October last year and September this year.
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Terekhov, Mikhail, Panfilov, Alexander, Dzenhaliou, Daniil, Gulcehre, Caglar, Andriushchenko, Maksym, Prabhu, Ameya, Geiping, Jonas
AI control protocols serve as a defense mechanism to stop untrusted LLM agents from causing harm in autonomous settings. Prior work treats this as a security problem, stress testing with exploits that use the deployment context to subtly complete harmful side tasks, such as backdoor insertion. In practice, most AI control protocols are fundamentally based on LLM monitors, which can become a central point of failure. We study adaptive attacks by an untrusted model that knows the protocol and the monitor model, which is plausible if the untrusted model was trained with a later knowledge cutoff or can search for this information autonomously. We instantiate a simple adaptive attack vector by which the attacker embeds publicly known or zero-shot prompt injections in the model outputs. Using this tactic, frontier models consistently evade diverse monitors and complete malicious tasks on two main AI control benchmarks. The attack works universally against current protocols that rely on a monitor. Furthermore, the recent Defer-to-Resample protocol even backfires, as its resampling amplifies the prompt injection and effectively reframes it as a best-of-$n$ attack. In general, adaptive attacks on monitor models represent a major blind spot in current control protocols and should become a standard component of evaluations for future AI control mechanisms.
Man arrested on suspicion of starting Pacific Palisades fire
A man has been arrested as a suspect in setting the Pacific Palisades fire in Los Angeles that killed 12 people and destroyed more than 6,000 homes in January. Justice department officials announced at a news conference that 29-year-old Jonathan Rinderknecht had been detained. They said evidence collected from his digital devices showed an image he generated on ChatGPT depicting a burning city. The fire was sparked on 7 January near a popular hiking trail overlooking the wealthy coastal neighbourhood. The Eaton Fire, ignited the same day in the Los Angeles area, killed another 19 people and destroyed about 9,400 structures, officials said.
Captain of tanker linked to Russian 'shadow fleet' charged in France
Captain of tanker linked to Russian'shadow fleet' charged in France The captain of an oil tanker believed to be part of Russia's shadow fleet of vessels used to evade sanctions has been charged by French authorities. The Chinese national was handed one count of refusing to follow instructions from the French navy and told to attend a court hearing in the northern coastal city of Brest next February. The Boracay left Russia last month and was off the coast of Denmark when unidentified drones forced the temporary closure of several airports last week. The tanker was earlier boarded by French soldiers because it was on a list of vessels subject to EU sanctions for carrying Russian oil exports. Russian President Vladimir Putin called France's actions piracy.
The Strange Ways Writers Are Proving That Their Writing Isn't ChatGPT
Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. The other week, I was reading an email I'd written when a strange notion occurred to me. Would it perhaps be better, an unsettling new voice suddenly whispered, to leave it in? This is a thought that would've appalled me a year ago. As a professional writer, I have long prided myself on impeccable grammar, judiciously wielded punctuation, and (at times indulgent) verbosity.
Strategy Adaptation in Large Language Model Werewolf Agents
Nakamori, Fuya, Huang, Yin Jou, Cheng, Fei
This study proposes a method to improve the performance of Werewolf agents by switching between predefined strategies based on the attitudes of other players and the context of conversations. While prior works of Werewolf agents using prompt engineering have employed methods where effective strategies are implicitly defined, they cannot adapt to changing situations. In this research, we propose a method that explicitly selects an appropriate strategy based on the game context and the estimated roles of other players. We compare the strategy adaptation Werewolf agents with baseline agents using implicit or fixed strategies and verify the effectiveness of our proposed method.