Large Language Model
Anthropic's Daniela Amodei Believes the Market Will Reward Safe AI
Anthropic's Daniela Amodei Believes the Market Will Reward Safe AI The Trump administration might think regulation is killing the AI industry, but Anthropic president Daniela Amodei disagrees. The Trump administration may think regulation is crippling the AI industry, but one of the industry's biggest players doesn't agree. At WIRED's Big Interview event on Thursday, Anthropic president and cofounder Daniela Amodei told WIRED editor at large Steven Levy that even though Trump's AI and crypto czar, David Sacks, may have tweeted that her company is "running a sophisticated regulatory capture strategy based on fear-mongering," she's convinced her company's commitment to calling out the potential dangers of AI is making the industry stronger. WIRED's iconic series returned to San Francisco with a series of unforgettable, in-depth live conversations. Check out more highlights here .
ByteDance and DeepSeek Are Placing Very Different AI Bets
The diverging path of China's two leading AI players shows where the country's artificial intelligence industry is headed. DeepSeek and ByteDance, the two leaders of China's AI industry, are adopting vastly different strategies. On Monday, DeepSeek released DeepSeek V3.2, another open-weight model that anyone can tinker with. The startup says it performs on par with the latest models from OpenAI and Google, and it even beats them on some key mathematics benchmarks. That same day, ByteDance, whose dominance in AI applications we covered previously, introduced ways for people to use its chatbot, Doubao.
The Chatbot-Delusion Crisis
Researchers are scrambling to figure out why generative AI appears to lead some people to a state of "psychosis." Listen to more stories on the Noa app. Chatbots are marketed as great companions, able to answer any question at any time. They're not just tools, but confidants; they do your homework, write love notes, and, as one recent lawsuit against OpenAI details, might readily answer 1,460 messages from the same manic user in a 48-hour period. Jacob Irwin, a 30-year-old cybersecurity professional who says he has no previous history of psychiatric incidents, is suing the tech company, alleging that ChatGPT sparked a "delusional disorder" that led to his extended hospitalization.
Delivering securely on data and AI strategy
Most organizations feel the imperative to keep pace with continuing advances in AI capabilities, as highlighted in a recent MIT Technology Review Insights report . That clearly has security implications, particularly as organizations navigate a surge in the volume, velocity, and variety of security data. This explosion of data, coupled with fragmented toolchains, is making it increasingly difficult for security and data teams to maintain a proactive and unified security posture. Data and AI teams must move rapidly to deliver the desired business results, but they must do so without compromising security and governance. As they deploy more intelligent and powerful AI capabilities, proactive threat detection and response against the expanded attack surface, insider threats, and supply chain vulnerabilities must remain paramount. "I'm passionate about cybersecurity not slowing us down," says Melody Hildebrandt, chief technology officer at Fox Corporation, "but I also own cybersecurity strategy.
The Download: LLM confessions, and tapping into geothermal hot spots
OpenAI is testing a new way to expose the complicated processes at work inside large language models. Researchers at the company can make an LLM produce what they call a confession, in which the model explains how it carried out a task and (most of the time) own up to any bad behavior. Figuring out why large language models do what they do--and in particular why they sometimes appear to lie, cheat, and deceive--is one of the hottest topics in AI right now. If this multitrillion-dollar technology is to be deployed as widely as its makers hope it will be, it must be made more trustworthy. OpenAI sees confessions as one step toward that goal. Sometimes geothermal hot spots are obvious, marked by geysers and hot springs on Earth's surface.
How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy
Ponomareva, Natalia, Xu, Zheng, McMahan, H. Brendan, Kairouz, Peter, Rosenblatt, Lucas, Cohen-Addad, Vincent, Guzmán, Cristóbal, McKenna, Ryan, Andrew, Galen, Bie, Alex, Yu, Da, Kurakin, Alex, Zadimoghaddam, Morteza, Vassilvitskii, Sergei, Terzis, Andreas
High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will soon have been used. Additionally, publicly available data often is not representative of users of a particular system -- for example, a research speech dataset of contractors interacting with an AI assistant will likely be more homogeneous, well articulated and self-censored than real world commands that end users will issue. Therefore unlocking high-quality data grounded in real user interactions is of vital interest. However, the direct use of user data comes with significant privacy risks. Differential Privacy (DP) is a well established framework for reasoning about and limiting information leakage, and is a gold standard for protecting user privacy. The focus of this work, \emph{Differentially Private Synthetic data}, refers to synthetic data that preserves the overall trends of source data,, while providing strong privacy guarantees to individuals that contributed to the source dataset. DP synthetic data can unlock the value of datasets that have previously been inaccessible due to privacy concerns and can replace the use of sensitive datasets that previously have only had rudimentary protections like ad-hoc rule-based anonymization. In this paper we explore the full suite of techniques surrounding DP synthetic data, the types of privacy protections they offer and the state-of-the-art for various modalities (image, tabular, text and decentralized). We outline all the components needed in a system that generates DP synthetic data, from sensitive data handling and preparation, to tracking the use and empirical privacy testing. We hope that work will result in increased adoption of DP synthetic data, spur additional research and increase trust in DP synthetic data approaches.
Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback
Liu, Pangpang, Lu, Junwei, Sun, Will Wei
We study estimation and statistical inference for reward models used in aligning large language models (LLMs). A key component of LLM alignment is reinforcement learning from human feedback (RLHF), where humans compare pairs of model-generated answers and their preferences are used to train a reward model. However, human feedback is inherently heterogeneous, creating significant challenges for reliable reward learning. To address this, we adopt a heterogeneous preference framework that jointly models the latent reward of answers and human rationality. This leads to a challenging biconvex optimization problem, which we solve via an alternating gradient descent algorithm. We establish theoretical guarantees for the resulting estimator, including its convergence and asymptotic distribution. These results enable the construction of confidence intervals for reward estimates. Leveraging these uncertainty quantification results, we conduct valid statistical comparisons between rewards and incorporate uncertainty into the best-of-$N$ (BoN) policy framework. Extensive simulations demonstrate the effectiveness of our method, and applications to real LLM data highlight the practical value of accounting for uncertainty in reward modeling for LLM alignment.
E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing
Sadhuka, Shuvom, Prinster, Drew, Fannjiang, Clara, Scalia, Gabriele, Regev, Aviv, Wang, Hanchen
Agentic AI systems execute a sequence of actions, such as reasoning steps or tool calls, in response to a user prompt. To evaluate the success of their trajectories, researchers have developed verifiers, such as LLM judges and process-reward models, to score the quality of each action in an agent's trajectory. Although these heuristic scores can be informative, there are no guarantees of correctness when used to decide whether an agent will yield a successful output. Here, we introduce e-valuator, a method to convert any black-box verifier score into a decision rule with provable control of false alarm rates. We frame the problem of distinguishing successful trajectories (that is, a sequence of actions that will lead to a correct response to the user's prompt) and unsuccessful trajectories as a sequential hypothesis testing problem. E-valuator builds on tools from e-processes to develop a sequential hypothesis test that remains statistically valid at every step of an agent's trajectory, enabling online monitoring of agents over arbitrarily long sequences of actions. Empirically, we demonstrate that e-valuator provides greater statistical power and better false alarm rate control than other strategies across six datasets and three agents. We additionally show that e-valuator can be used for to quickly terminate problematic trajectories and save tokens. Together, e-valuator provides a lightweight, model-agnostic framework that converts verifier heuristics into decisions rules with statistical guarantees, enabling the deployment of more reliable agentic systems.
Detecting AI Hallucinations in Finance: An Information-Theoretic Method Cuts Hallucination Rate by 92%
Large language models (LLMs) produce fluent but unsupported answers - hallucinations - limiting safe deployment in high-stakes domains. We propose ECLIPSE, a framework that treats hallucination as a mismatch between a model's semantic entropy and the capacity of available evidence. We combine entropy estimation via multi-sample clustering with a novel perplexity decomposition that measures how models use retrieved evidence. We prove that under mild conditions, the resulting entropy-capacity objective is strictly convex with a unique stable optimum. We evaluate on a controlled financial question answering dataset with GPT-3.5-turbo (n=200 balanced samples with synthetic hallucinations), where ECLIPSE achieves ROC AUC of 0.89 and average precision of 0.90, substantially outperforming a semantic entropy-only baseline (AUC 0.50). A controlled ablation with Claude-3-Haiku, which lacks token-level log probabilities, shows AUC dropping to 0.59 with coefficient magnitudes decreasing by 95% - demonstrating that ECLIPSE is a logprob-native mechanism whose effectiveness depends on calibrated token-level uncertainties. The perplexity decomposition features exhibit the largest learned coefficients, confirming that evidence utilization is central to hallucination detection. We position this work as a controlled mechanism study; broader validation across domains and naturally occurring hallucinations remains future work.
Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing
Rausch, Roman, Jansen, David, Singh, Sukhbinder, Orús, Román
Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising approach for LLM compression, but presents several practical hurdles, e.g. selecting appropriate layer-wise ranks and getting rid of its parameter redundancy. In this work, we present two physics-inspired improvements to SVD LLM compression: (1) \textbf{FermiGrad}, a gradient-descent algorithm that determines globally optimal layer-wise ranks by relaxing the discrete singular-value truncation into a continuous optimization using the Fermi function; (2) \textbf{PivGa}, an additional \textit{lossless} compression of the low-rank factors that exploits the intrinsic gauge freedom in their parametrization.