Large Language Model
Mind the Gap: Structure-Aware Consistency in Preference Learning
Abstractsurrogate loss (e.g., the logistic loss) as a proxy for the true objective: the non-convex, discontinuous 0-1 ranking Preference learning has become the foundationloss. This reliance raises a fundamental theoretical question of aligning Large Language Models (LLMs) withthat remains largely unanswered for deep networks: Does human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrominimizing these surrogate losses actually guarantee the minimization of the true ranking error? However, we demonstrate that for In this work, we investigate this question through the lens the equicontinuous hypothesis sets typical of neu-of H-consistency (Mao, Mohri, and Zhong, 2023e). We ral networks, these standard surrogates are theo-formulate LLM preference learning as a pairwise ranking retically inconsistent, yielding vacuous general-problem and derive a series of results that bridge the gap between learning theory and practical fine-tuning. To resolve this, we formulate LLM alignment within a margin-shifted rankingwe identify a fundamental theoretical deficiency in standard framework. We demonstrate that for equicontinuous hypothbounds that depend on enforcing a separationesis sets, a property satisfied by neural networks, standard margin ฮณ. Crucially, we extend this to Structure-surrogate minimization yields vacuous consistency guaranAware H-consistency, introducing a novel ob-tees. Specifically, without explicit constraints, a model can achieve arbitrarily low surrogate risk while maintaining ajective (SA-DPO) that adapts the margin based on the semantic distance between responses tohigh ranking error, effectively "cheating" the objective by handle synonyms and hard pairs. Finally, weshrinking score differences rather than learning the correct analyze the trade-off between consistency andordering. We prove that enforcing a confidence the Polynomial Hinge family) offer superior con-gap ฮณ is not merely a heuristic, but a strict requirement for sistency guarantees for capacity-bounded models H-consistency in the deep learning regime. However, while compared to the standard logistic loss used in DPO. a uniform margin restores consistency, it is a blunt instrument. We show that demanding a large, fixed margin on semantically identical pairs (synonyms) forces the model to hallucinate differences where none exist, introducing bias 1. Introductionand instability. To address this, we propose Structure-Aware H-consistency and a corresponding objective, StructureThe alignment of Large Language Models (LLMs) has shifted from explicit Reward Modeling (Stiennon et al., Aware DPO (SA-DPO).
Elon Musk Says He's Suing OpenAI Because They Abandoned Their Mission. I Think His Real Reason Is Much More Embarrassing.
A new scale of humiliation ritual kicked off this week as Elon Musk's lawsuit against OpenAI went to trial in Silicon Valley. The Tesla CEO, who co-founded OpenAI, is suing the artificial intelligence firm and two of its other co-founders, Sam Altman and Greg Brockman, for diverting from its original nonprofit goal of developing A.I. for the public good in favor of for-profit motives. "This lawsuit is very simple: It is not OK to steal a charity," Musk said on the witness stand on Tuesday. The trial is big by every conceivable measure. Both Musk and OpenAI have mustered high-dollar legal armies who are prepared to wage potentially years of litigation, including this federal trial.
Elon Musk Seemingly Admits xAI Has Used OpenAI's Models to Train Its Own
Elon Musk Seemingly Admits xAI Has Used OpenAI's Models to Train Its Own While answering questions under oath, Musk argued it's standard practice for AI labs to use their competitors' models. While testifying on Thursday in federal court, Elon Musk seemed to indicate that his AI lab may have used OpenAI's models to train xAI's own. He touched upon the topic while sitting on the witness stand answering cross-examination questions from an OpenAI attorney amid his ongoing legal battle against the ChatGPT-maker . Do you know what distillation is? It means to use one AI model to train another AI model.
OpenAI Rolls Out 'Advanced' Security Mode for At-Risk Accounts
OpenAI is rolling out Advanced Account Security for people concerned that their ChatGPT or Codex accounts could be potential targets of phishing attacks. For anyone who fears their ChatGPT and Codex accounts might be targeted by attackers, OpenAI announced on Thursday that it is adding an optional new level of account protection that adds an extra layer of security. Dubbed Advanced Account Security, the feature enforces strict access controls that would make account takeover attacks very difficult. Such measures are not a new idea in the realm of account security. Google, for example, has offered its Advanced Protection account security tier for nearly a decade . But as mainstream AI services rapidly proliferate around the world, there is a pressing need for an array of basic protections to be put in place.
Sam Altman's ChatGPT Couldn't Stop Obsessing Over Goblins
OpenAI desires less regulation, but it still doesn't know how its chatbot works. Get your news from a source that's not owned and controlled by oligarchs. OpenAI admitted it had to develop a specific instruction in the code of its latest model of ChatGPT to stop it from repeatedly referencing "goblins, gremlins, and other creatures." In an explanation posted Wednesday, the company said the "strange habit" came from its chatbot personality feature --specifically for users who chose the "Nerdy" personality. You are an unapologetically nerdy, playful and wise AI mentor to a human.
LLMDFA: Analyzing Dataflow in Code with Large Language Models
Dataflow analysis is a fundamental code analysis technique that identifies dependencies between program values. Traditional approaches typically necessitate successful compilation and expert customization, hindering their applicability and usability for analyzing uncompilable programs with evolving analysis needs in real-world scenarios. This paper presents LLMDFA, an LLM-powered compilation-free and customizable dataflow analysis framework. To address hallucinations for reliable results, we decompose the problem into several subtasks and introduce a series of novel strategies. Specifically, we leverage LLMs to synthesize code that outsources delicate reasoning to external expert tools, such as using a parsing library to extract program values of interest and invoking an automated theorem prover to validate path feasibility. Additionally, we adopt a few-shot chain-of-thought prompting to summarize dataflow facts in individual functions, aligning the LLMs with the program semantics of small code snippets to mitigate hallucinations. We evaluate LLMDFA on synthetic programs to detect three representative types of bugs and on real-world Android applications for customized bug detection. On average, LLMDFA achieves 87.10% precision and 80.77% recall, surpassing existing techniques with F1 score improvements of up to 0.35.
This startup's new mechanistic interpretability tool lets you debug LLMs
This startup's new mechanistic interpretability tool lets you debug LLMs Goodfire wants to make training AI models more like good old-fashioned software engineering. The San Francisco-based startup Goodfire just released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters--the settings that determine a model's behavior --during training. This could give model makers more fine-grained control over how this technology is built than was once thought possible. Goodfire claims Silico is the first off-the-shelf tool of its kind that can help developers debug all stages of the development process, from building a data set to training a model. LLMs contain a LOT of parameters. The company says its mission is to make building AI models less like alchemy and more like a science.
ChatGPT developed a goblin obsession after OpenAI tried to make it nerdy
Following the release of GPT-5.5 last week, people noticed something funny about OpenAI's latest model. In its Codex coding app, the company left a system prompt instructing GPT 5.5 to avoid mention of goblins, gremlins and other creatures. Yes, you read that right. Never talk about goblins, gremlins, racoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query, the prompt reads. Apparently, enough people started talking about ChatGPT's creature obsession that OpenAI felt the need to provide an accounting of where the goblins came from .
Google is quietly moving toward ads in Gemini
PCWorld reports that Google is exploring adding advertisements to its Gemini AI app, following OpenAI's implementation of sponsored ads in ChatGPT's free and budget plans. Google's business chief Philipp Schindler views ads as potentially valuable commercial information if properly integrated, while the company has already tested ads in AI Mode and AI Overviews. This move could make AI services more accessible but raises important concerns about maintaining transparency and ensuring ads don't influence AI responses. Putting ads in AI replies is a controversial but lucrative practice, and it's one that OpenAI has already embraced with its free and budget-priced ChatGPT plans. But while Google hasn't gone there yet with Gemini, company execs admitted they're mulling the idea.