Large Language Model
Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs
Yuan, Leitao, Mao, Qinghua, Liu, Daizong, Wang, Kun, Wang, Wenjie, Teng, Yan, Shao, Jing, Liu, Dongrui
Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate encoders can generalize to closed-source MLLMs. A key challenge for improving adversarial transferability is to effectively capture the intrinsic visual focus shared across different models, such that perturbations align with transferable semantic cues rather than surrogate-specific behaviors. However, existing methods suffer from spatial-domain feature redundancy and surrogate-specific gradient signals, thereby hindering cross-model transferability. In this paper, we propose FRA-Attack, which addresses both challenges from a unified frequency-domain regularization perspective. For feature alignment, a high-pass DCT objective on patch features suppresses redundant global structures and concentrates the loss on the high-frequency band that carries the MLLMs' intrinsic visual focus. For gradient optimization, we introduce Frequency-domain Gradient Regularization (FGR), a \textit{model-agnostic} low-pass regularizer that modulates the surrogate gradient using only the geometric frequency coordinate, \textit{i.e.}, no surrogate-derived statistic is involved, so that FGR is model-agnostic by construction, removing surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions. Together, the two components form a unified frequency-domain treatment of transferability. Extensive experiments on $15$ flagship MLLMs across $7$ vendors show that FRA-Attack achieves superior cross-model transferability, particularly with state-of-the-art performance on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash.
Meta Is in Crisis, Google Search's Makeover, and AI Gets Booed by Graduates
Meta Is in Crisis, Google Search's Makeover, and AI Gets Booed by Graduates This week on, the team discusses Meta's recent layoffs and what they've been hearing from employees about the increasingly grim vibes at the company. They also talk about Elon Musk losing his lawsuit against OpenAI and share highlights from Google's annual conference--including an ambitious AI vision to change how people search the web. Finally, what do recent college graduates and women whose spouses work in AI have in common? Google Search Goes Agentic--and Doesn't Need You Anymore Write to us at [email protected] . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . We spoke to more than a dozen employees and it turns out the job cuts are far from the only reason why Meta employees are really going through it. He lost his lawsuit against Sam Altman and OpenAI in really as full a way as you can, as dramatically as possible. I know, Zoë, you're looking forward to talking about that. We're going to get into why young adults might be using AI, but they have very complicated feelings about it. And later in the show, we're going to hear about why women married to AI bros have had enough . This week, the company is letting go of roughly 10 percent of its workforce, which is about 8,000 employees total. It's the latest round of job cuts, adding to the roughly 25,000 jobs that have been cut in the past few years as part of Mark Zuckerberg's Year of Efficiency that started in 2023 and now the latest AI-forward workplace, which he is trying to develop and impose. And while these latest cuts are not as big as some of the rounds of layoffs that have already happened, they're getting a ton of attention because Mark Zuckerberg, the CEO, has said that the reason they're happening, in part at least, in large part, is because the company is spending so much money on AI and data centers.
Roundtables: Can AI Learn to Understand the World?
Watch a subscriber-only discussion exploring how AI might enter the physical world. AI companies want to build systems that understand the external world and overcome the limitations of LLMs. Recent developments have brought world models to the forefront of the AI discussion. Watch a conversation with editor in chief Mat Honan, senior AI editor Will Douglas Heaven, and AI reporter Grace Huckins exploring how AI might enter the physical world. A woman's uterus has been kept alive outside the body for the first time Jessica Hamzelou Want to understand the current state of AI? Check out these charts. Want to understand the current state of AI? Check out these charts.
OpenAI makes breakthrough on 80-year-old maths problem
If you take a sheet of paper and add some dots, how many pairs can be the same distance apart? If you take a sheet of paper and add some dots, how many pairs can be the same distance apart? OpenAI has claimed a further advance in AI reasoning after its technology successfully tackled an 80-year-old maths problem. The company behind ChatGPT said it had made a breakthrough with a challenge first posed by Hungarian mathematician Paul Erdős in 1946: the planar unit distance problem. The question posed by Erdős is simple to explain.
Anthropic's Code with Claude showed off coding's future--whether you like it or not
Anthropic's Code with Claude showed off coding's future--whether you like it or not As tools like Claude Code get better, more and more developers are happy to hand off coding tasks to them. The way software gets built has changed for good. The vibes were strong at Code with Claude, Anthropic's two-day event for software developers in London that kicked off on May 19, the same day as Google's I/O in Palo Alto. "Who here has shipped a pull request in the last week that was completely written by Claude?" Jeremy Hadfield, an engineer at Anthropic, asked from the main stage. Almost half the people in the packed room--many sitting with laptops on their knees, coding or prompting as they watched the talks--raised their hands. Pull requests are fixes or updates to existing software that are submitted for review before they go live.
The Download: online safety's future and climate tech's big pivot
The Download: online safety's future and climate tech's big pivot Plus: SpaceX has filed for an IPO expected to be the largest ever. For months, the Trump administration has been going after researchers who study and try to counter hate speech, harassment, propaganda, and disinformation online. Now, some of those researchers are fighting back. In a new lawsuit, they're seeking to strike down a visa restriction policy against "foreign officials and other persons" announced last year by US Secretary of State Marco Rubio. They say the policy violates the speech and due process rights of foreign-born workers whose "work supports greater moderation of content on the [tech] platforms. Find out how the case could impact online safety and free speech .
Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs
Khosravi, Hamed, Huo, Xiaoming
A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round: no pooling across deployments, no waiting for a long-run average. Existing wrappers cannot deliver this on adaptive, online-updated streams: offline conformal-risk methods require exchangeability; online-conformal methods bound only long-run averages; non-exchangeable extensions are marginally valid; and the closest anytime wrapper, A-RCPS, controls marginal rather than selective risk. Using a (test statistic, validity guarantee, deployment rule) framework, we identify one empty cell forced by deployment requirements: e-process per threshold, selective risk, anytime-pathwise validity, max-certified-threshold rule. Conformal Selective Acting (CSA) fills it as a per-round wrapper maintaining a Ville-type e-process per threshold on a Bonferroni grid, evaluated against the RLVR filtration. Under predictable updates and isotonic-calibrated monotone risk we prove (i) an anytime-pathwise selective-risk bound $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$, (ii) rate-optimal certification matching $Θ(\barη^{-2}\log(1/δ))$, and (iii) a horizon-independent release-rate gap. Across eight specialist benchmarks ($480$ streams), sixteen adversarial distribution-shift cells ($160$ streams), and five live Expert-Iteration RLVR cells with online LoRA over four base models in three architecture families ($10{,}300$ rounds), CSA is the only method among ten compared that satisfies pathwise validity and non-refusing deployment on every cell. We do not propose a new LLM, training algorithm, or policy class; CSA is the deployment-side complement, orthogonal to the model, for operators who cannot use a frontier API.
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
Wang, Tong, Xu, Yiqing, Yang, Leo Yang
Interpretable text representations should expose coordinates that are not only predictive, but also meaningful enough for independent auditors to apply. Existing discriminative representations often use anonymous embedding directions, while concept-bottleneck and LLM-assisted methods attach natural-language names to features without ensuring that those definitions are reproducible or distinct from the target label. We propose an operational criterion for interpretable discriminative text representations: each coordinate should satisfy conceptual clarity, measured by chance-adjusted agreement between independent annotators applying the feature definition, and label disentanglement, meaning the feature should not merely paraphrase the prediction target. We instantiate this criterion in LLM-assisted Feature Discovery (LFD), an iterative method that proposes lexical and semantic features from contrastive outcome-opposed text pairs, screens candidates using cross-LLM Cohen's $κ$, and selects features by residual held-out predictive gain. A stylized analysis connects the $κ$ screen to a per-feature annotation-noise bound, formalizing agreement as a reliability check. Across ten text-classification tasks spanning seven corpora, LFD matches the predictive performance of a strong text bottleneck baseline while producing substantially clearer and less label-entangled features. Human audits with 232 raters show that LFD features achieve higher human--human and human--LLM agreement than baseline concepts, and raters consistently judge them as less label-leaking. These results suggest that agreement-tested, label-disentangled coordinates provide a practical auditability standard for interpretable text classification.
Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment
He, Shuaida, Chen, Liwen, Feng, Long
Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in which clients share only partial structure and a substantial subset may be contaminated. We propose Collaborative Low-rank Alignment and Identifiable Recovery (CLAIR), a contamination-aware framework that relies only on preliminary local estimators. Its formulation applies broadly, from linear regression to neural network and LLM modules, whenever local adaptation can be represented by matrix-valued updates. CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity. Empirically, we demonstrate the benefits of CLAIR by fine-tuning a Transformer architecture on a text-copying task. The results show accurate contamination detection and improved benign-client performance compared with local fine-tuning and non-robust federated averaging.
Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate
Kalra, Dayal Singh, Barkeshli, Maissam
Hyperparameter transfer allows extrapolating optimal optimization hyperparameters from small to large scales, making it critical for training large language models (LLMs). This is done either by fitting a scaling law to the hyperparameters or by a judicious choice of parameterization, such as Maximal Update ($μ$P), that renders optimal hyperparameters approximately scale invariant. In this paper, we first develop a framework to quantify hyperparameter transfer through three metrics: (1) the quality of the scaling law fit, (2) the robustness to extrapolation errors, and (3) the asymptotic loss penalty due to choice of parameterization. Next, we investigate through a comprehensive series of ablations why $μ$P appears to offer high-quality learning rate transfer relative to standard parameterization (SP), as existing theory is inadequate. We find that the overwhelming benefit of $μ$P relative to SP when training with AdamW arises simply from maximizing the learning rate of the embedding layer. In SP, the embedding layer learning rate acts as a bottleneck that induces training instabilities; increasing it by a factor of width to match $μ$P dramatically smooths out training while improving hyperparameter transfer. We also find that weight decay improves the scaling law fits, while, in the fixed token-per-parameter setting, it hurts the robustness of the extrapolation.