Goto

Collaborating Authors

 Large Language Model


CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency

arXiv.org Machine Learning

Large language models often improve reasoning by sampling multiple outputs and aggregating their final answers, but precise and efficient control of error levels remains a challenging task. In particular, deciding when to stop sampling remains difficult when the stopping rule is data-dependent and the set of possible response labels is not known in advance. We study anytime-valid certification of a prespecified target answer as the unique mode of the model's response distribution, a guarantee distinct from answer correctness. We propose the Certification by Intersection-union Testing with Eprocesses (CITE) algorithm, which provably controls false certification at any prescribed level under arbitrary data-driven stopping, without requiring prior knowledge of the answer category set. We also prove a category-set-size-free stopping-time rate, establish matching minimax lower bounds up to constants in the main regime, and extend the construction to confidence-weighted voting. Simulations and LLM self-consistency experiments show empirical error control and improved certification in diffuse-tail settings.


Towards Reliable LLM Evaluation: Correcting the Winner's Curse in Adaptive Benchmarking

arXiv.org Machine Learning

Adaptive prompt and program search makes LLM evaluation selection-sensitive. Once benchmark items are reused inside tuning, the observed winner's score need not estimate the fresh-data performance of the full tune-then-deploy procedure. We study inference for this procedure-level target under explicit tuning budgets. We propose SIREN, a selection-aware repeated-split reporting protocol that freezes the post-search shortlist, separates splitwise selection from held-out evaluation, and uses an item-level Gaussian multiplier bootstrap for uncertainty quantification. In a fixed-shortlist regime with smooth stabilized selection, the estimator admits a first-order item-level representation, and the bootstrap yields valid simultaneous inference on a finite budget grid. This supports confidence intervals for procedureperformance curves and pre-specified equal-budget and cross-budget comparisons. Controlled simulations and MMLU-Pro tuning experiments show that winnerbased reporting can be optimistic and can change deployment conclusions, while SIREN remains close to the finite-sample reporting target. Codes are available at https://github.com/jznmsl/siren.


Attributions All the Way Down? The Metagame of Interpretability

arXiv.org Machine Learning

We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $ϕ(f)$ explaining a model $f$, we measure the directional influence of feature $j$ on the attribution of feature $i$, denoted as meta-attribution $φ_{j \to i}(f)$, by treating the attribution method itself as a cooperative game and computing its Shapley value. Theoretically, we prove that attributions hierarchically decompose into meta-attributions, and establish these as directional extensions of existing interaction indices. Empirically, we demonstrate that the metagame delivers insights across diverse interpretability applications: (i) quantifying token interactions in instruction-tuned language models, (ii) explaining cross-modal similarity in vision-language encoders, and (iii) interpreting text-to-image concepts in multimodal diffusion transformers.


The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity

arXiv.org Machine Learning

Despite the prevalence of the attention sink phenomenon in Large Language Models (LLMs), where initial tokens disproportionately monopolize attention scores, its structural origins remain elusive. This work provides a \textit{mechanistic explanation} for this phenomenon. First, we trace its root to the value aggregation process inherent in self-attention, which induces a systematic variance discrepancy. We further demonstrate that this discrepancy is drastically amplified by the activation of super neurons within Feed-Forward Network (FFN) layers. Specifically, the channel-sparse down-projections trigger a dimension disparity of the first-token representation, necessitating the formation of attention sinks as a structural anchor. Then, we validate this causal chain through two controlled interventions: (i) isolating the aggregation effect via attention mask modifications and (ii) amplifying the variance of targeted token representations. Both interventions can replicate attention sinks at arbitrary positions. Our mechanistic understanding offers a foundation for the systematic control of sink formation. Finally, as a proof of concept, we propose \textit{head-wise RMSNorm}, an architectural modification that stabilizes value aggregation outputs during pre-training. Our experiments demonstrate that restoring statistical parity across positions significantly accelerates convergence.


OpenAI debuts a Codex plugin for Chrome

Engadget

We're seeing coding be one of the leading applications of artificial intelligence tools, and OpenAI is continuing to expand on its offerings in that space. The company has launched a Chrome extension for its Codex platform. The new browser-based capabilities of the plugin include testing web apps, collecting context from across open tabs and using Chrome DevTools in parallel while the user performs other tasks. This extension could also help Codex be more appealing to casual users and additional professions beyond developers since so many computing tasks happen in browsers. Codex can now take on more of your browser dev work.


ChatGPT Has 'Goblin' Mania in the US. In China It Will 'Catch You Steadily'

WIRED

OpenAI's chatbot has some weird linguistic tics in Chinese that are driving users crazy. Are you even online in 2026 if you haven't experienced the verbal tics of ChatGPT? It loves goblins, em dashes, and "it's not A; it's B" sentence constructions. But what you might not know is that the chatbot also has plenty of strange phrases it loves to say in Chinese, and they are driving Chinese users crazy. ChatGPT does a decent job answering questions in Chinese, which is why it's widely used in China despite being blocked by the government.


This 'anti-goal' prompt trick keeps ChatGPT from going rogue

PCWorld

When you purchase through links in our articles, we may earn a small commission. This'anti-goal' prompt trick keeps ChatGPT from going rogue A simple prompt structure using XML tags can stop ChatGPT, Claude, and Gemini from doing things you never asked for. All too often, ChatGPT, Claude, and Gemini overstep their instructions because they're so focused on making you happy. For example, an AI may jump ahead and completely rewrite a document when all you wanted was some focused feedback, or it may draft a brand-new recipe when you just wanted help substituting an ingredient. You might think the solution is to tell the AI chatbot what it do in your prompt.


The Download: the tech reshaping IVF and the rise of balcony solar

MIT Technology Review

Plus: After years of insults, Anthropic and SpaceX have teamed up. IVF has brought millions of babies into the world over the last four decades. But the process can still be slow, painful, and expensive--and far from guaranteed to work. Now, a wave of new technologies aims to change that. Researchers are using AI to identify promising sperm and embryos, developing robotic systems that could automate parts of the IVF process, and even exploring controversial genetic editing techniques designed to prevent inherited disease. The technologies could make IVF more effective and accessible.


Perturbation is All You Need for Extrapolating Language Models

arXiv.org Machine Learning

We introduce a simple yet powerful framework for training large language models. In contrast to the standard autoregressive next-token prediction based on an exact prefix, we propose a perturbation-based procedure that first transforms the prefix into a semantic neighbor and then conditions on this perturbed variant for next-token prediction. This yields a hierarchical model with a pre-post-additive noise structure. Within this framework, we develop a rigorous theory of extrapolability, namely, the capacity of a model class to make reliable predictions for token sequences that lie outside the empirical support of the training corpus. We evaluate the finite-sample performance of the proposed procedure using both synthetic and real-world language data. Results show that the proposed method consistently improves out-of-support prediction while maintaining competitive in-support performance, demonstrating that perturbation offers a practical route to language modeling.


Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

arXiv.org Machine Learning

Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a quantitative converse establishing the asymmetry coefficient $G$ as the unique control parameter for direction. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $ϕ\ge 1/5$ with worst cut at $t^\ast/n \approx 0.32$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. The resulting two-axis diagnostic ($ϕ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (LC-AUROC from 0.62 to 0.84) on tested models up to 8B parameters, with polarity reversing as predicted between HaluEval and MedHallu.