Goto

Collaborating Authors

 Large Language Model


Triangulation as an Acceptance Rule for Multilingual Mechanistic Interpretability

arXiv.org Machine Learning

Multilingual language models achieve strong aggregate performance yet often behave unpredictably across languages, scripts, and cultures. We argue that mechanistic explanations for such models should satisfy a \emph{causal} standard: claims must survive causal interventions and must \emph{cross-reference} across environments that perturb surface form while preserving meaning. We formalize \emph{reference families} as predicate-preserving variants and introduce \emph{triangulation}, an acceptance rule requiring necessity (ablating the circuit degrades the target behavior), sufficiency (patching activations transfers the behavior), and invariance (both effects remain directionally stable and of sufficient magnitude across the reference family). To supply candidate subgraphs, we adopt automatic circuit discovery and \emph{accept or reject} those candidates by triangulation. We ground triangulation in causal abstraction by casting it as an approximate transformation score over a distribution of interchange interventions, connect it to the pragmatic interpretability agenda, and present a comparative experimental protocol across multiple model families, language pairs, and tasks. Triangulation provides a falsifiable standard for mechanistic claims that filters spurious circuits passing single-environment tests but failing cross-lingual invariance.


MultiRisk: Multiple Risk Control via Iterative Score Thresholding

arXiv.org Machine Learning

As generative AI systems are increasingly deployed in real-world applications, regulating multiple dimensions of model behavior has become essential. We focus on test-time filtering: a lightweight mechanism for behavior control that compares performance scores to estimated thresholds, and modifies outputs when these bounds are violated. We formalize the problem of enforcing multiple risk constraints with user-defined priorities, and introduce two efficient dynamic programming algorithms that leverage this sequential structure. The first, MULTIRISK-BASE, provides a direct finite-sample procedure for selecting thresholds, while the second, MULTIRISK, leverages data exchangeability to guarantee simultaneous control of the risks. Under mild assumptions, we show that MULTIRISK achieves nearly tight control of all constraint risks. The analysis requires an intricate iterative argument, upper bounding the risks by introducing several forms of intermediate symmetrized risk functions, and carefully lower bounding the risks by recursively counting jumps in symmetrized risk functions between appropriate risk levels. We evaluate our framework on a three-constraint Large Language Model alignment task using the PKU-SafeRLHF dataset, where the goal is to maximize helpfulness subject to multiple safety constraints, and where scores are generated by a Large Language Model judge and a perplexity filter. Our experimental results show that our algorithm can control each individual risk at close to the target level.


More Than Bits: Multi-Envelope Double Binary Factorization for Extreme Quantization

arXiv.org Machine Learning

For extreme low-bit quantization of large language models (LLMs), Double Binary Factorization (DBF) is attractive as it enables efficient inference without sacrificing accuracy. However, the scaling parameters of DBF are too restrictive; after factoring out signs, all rank components share the same magnitude profile, resulting in performance saturation. We propose Multi-envelope DBF (MDBF), which retains a shared pair of 1-bit sign bases but replaces the single envelope with a rank-$l$ envelope. By sharing sign matrices among envelope components, MDBF effectively maintains a binary carrier and utilizes the limited memory budget for magnitude expressiveness. We also introduce a closed-form initialization and an alternating refinement method to optimize MDBF. Across the LLaMA and Qwen families, MDBF enhances perplexity and zero-shot accuracy over previous binary formats at matched bits per weight while preserving the same deployment-friendly inference primitive.


AI-Powered Dating Is All Hype. IRL Cruising Is the Future

WIRED

AI-Powered Dating Is All Hype. Dating apps and AI companies have been touting bot wingmen for months. But the future might just be good old-fashioned meet-cutes. I am, admittedly, a big flirt. I love everything about the exchange of getting to know another person.


SoftBank lifts OpenAI stake to 11% with 41 billion investment

The Japan Times

Having made colossal profits as well as losses on previous investments, founder Masayoshi Son has pivoted SoftBank toward artificial intelligence. Japanese tech investor SoftBank said Wednesday that its stake in OpenAI is now around 11% after completing the second stage of a $41-billion investment in the maker of ChatGPT. Having made colossal profits as well as losses on previous investments, flamboyant founder Masayoshi Son has pivoted SoftBank toward artificial intelligence. SoftBank had announced in April its planned investment of up to $40 billion in Open AI, and on Wednesday it said that the second tranche of $22.5 billion was completed. The final investment reached $41 billion and includes $30 billion from SoftBank's Vision Fund plus $11 billion from other third-party co-investors, it said.


Meta buys startup known for its AI task automation agents

Engadget

Its acquisition of Manus is one of the highest-profile yet from Asia's AI startup ecosystem. Meta has acquired an AI startup called Manus -- known for its custom research and website-building agents -- in a deal valued at more than $2 billion, according to . It's reportedly one of the largest acquisitions yet involving a startup nurtured in China's AI ecosystem. Manus arrived in March 2025, shortly after another Chinese AI startup, DeepSeek appeared on the scene. The company (called Butterfly Effect at the time) originally described it as the first general AI agent to perform complex tasks autonomously, rather than just generating ideas.


The ascent of the AI therapist

MIT Technology Review

Four new books grapple with a global mental-health crisis and the dawn of algorithmic therapy. A technician adjusts the wiring inside the Mark I Perceptron. This early AI system was designed not by a mathematician but by a psychologist. More than a billion people worldwide suffer from a mental-health condition, according to the World Health Organization. The prevalence of anxiety and depression is growing in many demographics, particularly young people, and suicide is claiming hundreds of thousands of lives globally each year. Given the clear demand for accessible and affordable mental-health services, it's no wonder that people have looked to artificial intelligence for possible relief.


2025 digest of digests

AIHub

Throughout the year we've reported on some of the larger stories, and some of the lesser-covered happenings, in our regular monthly digests. We look back through the archives and pick out one or two stories from each of our digests. This month, AI startup DeepSeek released DeepSeek R1, a reasoning model designed for good performance on logic, maths, and pattern-finding tasks. The company has also launched six smaller versions of R1 that are tiny enough to run locally on laptops. In Wired, Zeyi Yang reported on who is behind the startup, whilst Tongliang Liu (in The Conversation) looked at how DeepSeek achieved its results with a fraction of the cash and computing power of its competitors.


An AI prompt engineer's dream: unlimited tokens, side-by-side responses, and lifetime access

PCWorld

When you purchase through links in our articles, we may earn a small commission. An AI prompt engineer's dream: unlimited tokens, side-by-side responses, and lifetime access A ChatPlayground AI Unlimited Plan lifetime subscription gives you unrestricted access to 25+ top models, including ChatGPT, Claude, Gemini, and Llama, for one upfront price of $79 with zero usage caps (MSRP $619). ChatPlayground AI fixes that with a unified dashboard where you can access all of today's top models in a single dashboard without limits. That's the kind of time savings anyone can appreciate, especially heading into a new year with tons of work ahead of all of us. Get started by entering a single prompt and get multiple responses each time.


Trustworthy Machine Learning under Distribution Shifts

arXiv.org Machine Learning

Machine Learning (ML) has been a foundational topic in artificial intelligence (AI), providing both theoretical groundwork and practical tools for its exciting advancements. From ResNet for visual recognition to Transformer for vision-language alignment, the AI models have achieved superior capability to humans. Furthermore, the scaling law has enabled AI to initially develop general intelligence, as demonstrated by Large Language Models (LLMs). To this stage, AI has had an enormous influence on society and yet still keeps shaping the future for humanity. However, distribution shift remains a persistent ``Achilles' heel'', fundamentally limiting the reliability and general usefulness of ML systems. Moreover, generalization under distribution shift would also cause trust issues for AIs. Motivated by these challenges, my research focuses on \textit{Trustworthy Machine Learning under Distribution Shifts}, with the goal of expanding AI's robustness, versatility, as well as its responsibility and reliability. We carefully study the three common distribution shifts into: (1) Perturbation Shift, (2) Domain Shift, and (3) Modality Shift. For all scenarios, we also rigorously investigate trustworthiness via three aspects: (1) Robustness, (2) Explainability, and (3) Adaptability. Based on these dimensions, we propose effective solutions and fundamental insights, meanwhile aiming to enhance the critical ML problems, such as efficiency, adaptability, and safety.