Goto

Collaborating Authors

 gpt-5


Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

arXiv.org Machine Learning

Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate encoders can generalize to closed-source MLLMs. A key challenge for improving adversarial transferability is to effectively capture the intrinsic visual focus shared across different models, such that perturbations align with transferable semantic cues rather than surrogate-specific behaviors. However, existing methods suffer from spatial-domain feature redundancy and surrogate-specific gradient signals, thereby hindering cross-model transferability. In this paper, we propose FRA-Attack, which addresses both challenges from a unified frequency-domain regularization perspective. For feature alignment, a high-pass DCT objective on patch features suppresses redundant global structures and concentrates the loss on the high-frequency band that carries the MLLMs' intrinsic visual focus. For gradient optimization, we introduce Frequency-domain Gradient Regularization (FGR), a \textit{model-agnostic} low-pass regularizer that modulates the surrogate gradient using only the geometric frequency coordinate, \textit{i.e.}, no surrogate-derived statistic is involved, so that FGR is model-agnostic by construction, removing surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions. Together, the two components form a unified frequency-domain treatment of transferability. Extensive experiments on $15$ flagship MLLMs across $7$ vendors show that FRA-Attack achieves superior cross-model transferability, particularly with state-of-the-art performance on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash.


HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

arXiv.org Machine Learning

Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across tasks such as summarization, question answering, retrieval-augmented generation, and agentic interaction. This fragmentation makes it unclear whether a mitigation that works in one setting actually reduces hallucinations across contexts. Current hallucination benchmarks either require human annotation and fixed references that may eventually be memorized, or rely on naturalistic observations often recorded in settings that are difficult to reproduce or test systematically. To enable further research on the root causes of hallucination, we introduce HALLUWORLD, an extensible benchmark framework grounded in an explicit reference-world formulation: a model hallucinates when it produces an observable claim that is false with respect to this reference world. Building on this view, we construct a family of synthetic and semi-synthetic benchmark environments in which the reference world is fully specified, the model's observable view is controlled, and hallucination labels can be generated automatically by construction. HALLUWORLD spans multiple settings that are classically representative for AI, i.e., gridworlds, chess, and realistic terminal tasks. This enables controlled variation of key factors such as world complexity, observability, temporal change, and source-conflict policy, allowing us to disentangle hallucinations into more fine-grained error categories. We evaluate frontier and open-weight language models across these settings and find consistent patterns across domains: perceptual hallucination on directly observed information is near-solved for frontier models, while multi-step state tracking and causal forward simulation are still difficult for frontier models, and are not generally solved by extended thinking.


ChatGPT's new default model is dialing back the annoying emojis

PCWorld

PCWorld reports the update delivers 52.5% fewer hallucinations and 37.3% fewer inaccurate claims while providing more concise answers. Enhanced features include improved context integration from previous chats, files, and Gmail, plus transparency showing which memory sources influenced responses. One reason I took a break from ChatGPT a few months ago (I'm back now) was how sick to death I got of its constant emojis, especially when it came to lists. The brain emoji was my least favorite, along with the green checkmarks, the pointy fingers, and the yellow "hazard" signs. Well, I'll believe it when I see it, but with its latest "instant" model, OpenAI promises that we'll be getting way less of those "gratuitous" emojis in ChatGPT's responses.


ChatGPT developed a goblin obsession after OpenAI tried to make it nerdy

Engadget

Following the release of GPT-5.5 last week, people noticed something funny about OpenAI's latest model. In its Codex coding app, the company left a system prompt instructing GPT 5.5 to avoid mention of goblins, gremlins and other creatures. Yes, you read that right. Never talk about goblins, gremlins, racoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query, the prompt reads. Apparently, enough people started talking about ChatGPT's creature obsession that OpenAI felt the need to provide an accounting of where the goblins came from .


ChatGPT has a 'goblin' obsession. Now we know why

PCWorld

PCWorld reports that OpenAI's GPT models, including GPT-5.5, developed an unusual obsession with mentioning goblins and similar creatures in responses. This quirky behavior stemmed from a "Nerdy" personality instruction encouraging playful language use, which became reinforced through AI training processes. The goblin references became so prevalent that OpenAI implemented a direct ban in its Codex app, illustrating the unpredictable nature of large language model training. I've seen some odd AI system instructions in my day, but this one takes the cake: a prompt in OpenAI's Codex command-line app that demands models "never talk about goblins, gremlins, trolls, ogres, pigeons, or other animals or creatures."


Your old prompts won't work with GPT-5.5. Try these instead

PCWorld

When you purchase through links in our articles, we may earn a small commission. If you're using long and overly specific prompts with ChatGPT's latest model, you're doing it wrong. OpenAI's latest and most powerful model, GPT-5.5, has been topping benchmark charts and impressing users with its coding and reasoning abilities, not to mention the sheer quantity of facts at its fingertips. But while ChatGPT's latest model doesn't require the hand-holding that older models did, it also gets fussy with the longer, highly detailed prompts that might have worked well in the past. If you're seeing worse performance with GPT-5.5 than you had with previous models, it might be your prompt constructions.


OpenAI's GPT-5.5 is faster, smarter, and a step toward its 'super app'

PCWorld

PCWorld reports that OpenAI has launched GPT-5.5, its most advanced AI model, exclusively for paying ChatGPT subscribers on Plus, Pro, Business, and Enterprise plans. The new model delivers faster, more efficient performance in coding, research, and math while outperforming competitors like Google's Gemini 3.1 Pro and Anthropic's Claude Opus 4.7. GPT-5.5 represents a significant step toward OpenAI's'super app' vision, integrating various AI services into one comprehensive platform. OpenAI recently launched GPT-5.5, which the company describes as its most advanced and intuitive AI model to date. The new model is said to be both faster and more efficient, with specific improvements in areas including coding, research, and math. At the same time, it's said to perform better compared to competing models like Google's Gemini 3.1 Pro and Anthropic's Claude Opus 4.7. According to OpenAI co-founder Greg Brockman, GPT-5.5 is also a step towards the company's vision of a future "super app," where services such as ChatGPT, Codex, and an AI-driven web browser are integrated into a single platform, reports TechCrunch . GPT-5.5 is currently rolling out to paying ChatGPT users, which includes those on Plus, Pro, Business, and Enterprise plans. This article originally appeared on our sister publication PC för Alla and was translated and localized from Swedish.


OpenAI is throwing everything into building a fully automated researcher

MIT Technology Review

OpenAI is refocusing its research efforts and throwing its resources into a new grand challenge. The San Francisco firm has set its sights on building what it calls an AI researcher, a fully automated agent-based system that will be able to go off and tackle large, complex problems by itself. OpenAI says that this new research goal will be its "North Star" for the next few years, pulling together multiple research strands, including work on reasoning models, agents, and interpretability .


ChatGPT is dialing back its 'if you want' end-response teasers

PCWorld

Instant to reduce annoying "if you want" and teaser-style phrasing that users found intrusive. This change addresses widespread user complaints about persistent, clickbait-like follow-up prompts that negatively impacted the AI interaction experience. The update aims to create more natural, direct conversations by making ChatGPT less chatty and eliminating the bothersome response teasers. It wasn't all that long ago that ChatGPT was a constant nag, persistently dropping "Would you like me to?"-style questions at the end of its responses. OpenAI eventually tweaked the phrasing, dropping the question marks and going for "if you want"-style teasers that invited users to extend their chat sessions. Now, OpenAI has acknowledged that it went too far with the clickbaity follow-ups, noting in a recent update for one of its newest models that it's now cutting back on the teasers. "We're rolling out an update to GPT-5.3 Instant that improves follow-up tone and reduces teaser-style phrasing," reads a recent ChatGPT release note, which adds that users should soon see fewer follow-ups like "if you want," "you'll never believe," and "I can tell you three things that " Those teasers are, of course, a way for ChatGPT to keep subscribers chatting, but users have been complaining that the persistent follow-ups are more annoying than they are intriguing. "I hated it with a passion and hope it's completely gone," wrote one user on Reddit .


GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users

Engadget

GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users The new model offers performance improvements in reasoning, multimodal understanding and more. The ChatGPT icon, as seen on iPhone 12 running iOS. When OpenAI released GPT-5.4 at the start of March, the company said the new model was designed primarily for professional work like programming and data analysis. Now OpenAI is launching GPT-5.4 mini and nano, and while it is once again highlighting the usefulness of these new systems for tasks like coding, one of the new models is available to Free and Go users . What's more, that model, GPT-5.4 mini, even offers performance that approaches GPT-5.4 in a handful of areas.