Goto

Collaborating Authors

 Large Language Model


Demis Hassabis Thinks AI Job Cuts Are Dumb

WIRED

The CEO of Google DeepMind tells WIRED that companies should use the productivity gains of AI to do more, not lay people off. Demis Hassabis, the CEO of Google DeepMind, is keen to talk about the coding skills of his company's newest model, Gemini 3.5 Flash. The model has been trained to perform complex agentic coding tasks: translate large code bases from one language to another; find and fix bugs lurking deep in knotty code; and even write entire operating systems from scratch. Hassabis does not, however, think this spells doom for software developers. "I have no idea why people are going around talking with certainty about that," Hassabis tells WIRED ahead of the new model reveal at today's Google's I/O event .


Google Search is turning into an AI assistant--and it doesn't want you to leave

PCWorld

Google is transforming its search engine into an AI-powered assistant called Spark, featuring conversational interactions and a personalized'daily brief' for task management. PCWorld reports the company is expanding mobile search capabilities to handle complex queries using text, images, and video while integrating restaurant reservations and payments. This evolution blurs the line between traditional search and AI assistance, keeping users within Google's ecosystem through proactive monitoring and personalized results.


Musk v Altman: tech bros at war over OpenAI โ€“ The Latest

The Guardian

A long and bitter legal battle between tech billionaires Elon Musk and Sam Altman has culminated in victory for the OpenAI boss. Musk has vowed to appeal the verdict. But what did the trial reveal about big tech and the global AI race?


Former OpenAI Staffers Warn xAI's Poor Safety Record Could Complicate SpaceX's IPO

WIRED

The ex-employees, who cofounded a new AI watchdog group, say investors deserve more information about xAI's safety practices before SpaceX goes public. Two former OpenAI employees and a group of AI safety nonprofits are warning that Elon Musk's AI lab, xAI, could become a liability for prospective investors in SpaceX, which is preparing to file what's expected to be the largest initial public offering in Wall Street History. In a letter directed to investors published on Tuesday, the ex-staffers highlighted what they describe as "unpriced risks" related to xAI that could complicate SpaceX's reported plans to raise up to $75 billion as part of its IPO. The rocket company's private valuation shot up to over $1 trillion after it acquired xAI last year . Musk claimed his rocket company could launch data centers into space for his AI lab, but the letter's authors argue that xAI's poor record on safety issues could complicate how investors view the combined company as it gets ready to submit its IPO prospectus filing .


Zoe Kleinman: Why the AI industry is the real winner of the Musk-Altman trial

BBC News

It is not only OpenAI but the AI race itself that was vindicated in the California courtroom last night . Even though Elon Musk essentially lost on a technicality, there's a clear signal from the verdict that making lots of money from AI and competing fiercely with rivals is simply business. The industry sometimes tries to display a united front, especially when it comes to safety, research and inclusivity. But this case served as a powerful reminder that none of the AI giants are charities and don't have to be, even if they once said otherwise. Cracks in the faรงade of industry collaboration for the sake of humanity have been exposed before.


Musk vs Altman: What to know about the OpenAI verdict

Al Jazeera

On Monday morning, a jury in Oakland, California, announced its verdict in one of the most-watched tech feuds between billionaire Elon Musk and OpenAI CEO Sam Altman. The nine-member jury handed a decisive victory to Altman, saying Musk had waited too long to bring his claims against the artificial intelligence company and its top executives. Musk, who cofounded OpenAI as a nonprofit, had filed a $150bn lawsuit against the organisation, Altman and its president, Greg Brockman, accusing them of turning it into a for-profit entity for personal enrichment. Instead, the case became focused on a procedural issue. After deliberating for less than two hours, the jury unanimously found that the statute of limitations had expired before Musk filed the lawsuit in 2024, meaning jurors concluded he had waited too long to bring his claims under the applicable legal deadline.


Elon Musk loses case against Sam Altman over OpenAI's overhaul

The Japan Times

Elon Musk loses case against Sam Altman over OpenAI's overhaul Elon Musk arrives at the Ronald V. Dellums Federal Building for court in Oakland, California on April 30. A jury rejected Elon Musk's claims that OpenAI under Sam Altman's leadership betrayed its mission to benefit the public by morphing into a for-profit business, finding that he waited too long to sue the company. The verdict reached Monday in federal court in Oakland, California, follows a trial over the bitter feud between the entrepreneurs who worked together to launch the startup in 2015. OpenAI has since evolved into one of the world's most valuable and powerful artificial intelligence companies. "I think there is a substantial amount of evidence to support the jury's findings," U.S. District Judge Yvonne Gonzalez Rogers said when she accepted the nine-member jury's unanimous conclusion after about two hours of deliberations.


Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

arXiv.org Machine Learning

Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning (RL) to improve their reasoning capabilities. Three types of approaches have been widely adopted: The first relies on a deep neural network to estimate the value function of the learning policy in order to reduce the variance of the policy gradient. However, estimating and maintaining such a value network incurs substantial computational and memory overhead. The second avoids training a value network by approximating the value function using sample averages. However, it samples a large number of reasoning traces per prompt for accurate value function approximation, making it computationally expensive. The third samples only a single reasoning trajectory per prompt, which reduces computational cost but suffers from poor sample efficiency. This paper focuses on a practical, resource-constrained setting in which only a small number of reasoning traces can be sampled per prompt, while low-variance gradient estimation remains essential for high-quality policy learning. To address this challenge, we bring classical nonparametric statistical methods, which are both computationally and statistically efficient, to LLM reasoning. We employ kernel smoothing as a concrete example for value function estimation and the subsequent policy optimization. Numerical and theoretical results demonstrate that our proposal achieves accurate value and gradient estimation, leading to improved policy optimization.


Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?

arXiv.org Machine Learning

Large language models (LLMs) are increasingly used as automated evaluators of AI systems, including in high-stakes applications. In this role, LLMs are used to generate judgments about the quality, appropriateness, or even safety of model outputs. This approach is motivated by practical constraints. Expert human ratings are costly and difficult to scale, whereas LLM ratings can be produced quickly at low cost. However, current approaches to deploying LLM evaluators are ad hoc, typically limited to reporting agreement metrics between human and LLM judges as a justification for substitution of human ratings, and lack a formal basis for study design. This paper (1) shifts the role of the LLM judge from substitutive to auxiliary, and (2) formulates the LLM-as-a-judge paradigm as one of augmenting human evaluation through a two-stage sampling design, where LLM evaluations are measured for all observations at the first stage and human ratings are partially observed for a subsample at the second stage. We propose to use a doubly robust estimator from the missing data literature, which takes advantage of the robustness property against the prediction model, since the missingness model is known by design. Using the asymptotic variance of this estimator, we propose how sample sizes of human and LLM ratings can be determined to achieve a targeted level of power. We also show that a study can be efficiently designed by allocating more human ratings for types of evaluations where the predictability of LLM ratings is not high. To the best of our knowledge, there is very little guidance on how much human oversight should be retained when validating benchmarks.


Your SaaS Is an Insurance Product: A Modeling Framework

arXiv.org Machine Learning

Capped-usage SaaS products -- LLM subscriptions such as Claude Code and ChatGPT, cloud platforms such as Vercel and Cloudflare Workers, corporate benefit platforms, identity-verification services with liability transfer -- share a structural signature with insurance products: a fixed premium decoupled from realized consumption, stochastic per-user demand with heavy-tailed severity, a non-fungible cap that resets on a fixed schedule, and a portfolio-level exposure that requires reserve adequacy under tail risk. We argue that this is not an analogy. It is the same operational problem actuarial science has been tooled for decades to address, restated with new dependent variables (tokens, bandwidth bytes, function-invocations, gym check-ins) in place of medical claims. This paper proposes a modeling framework for capped-usage SaaS pricing built from frequency-severity decomposition, premium calculation principles, and Monte Carlo reserve adequacy. We map the framework to publicly observable subscription tiers in two domains (LLM services and cloud platforms), ground it in canonical health-insurance economics (Arrow 1963; Pauly 1968; Manning et al. 1987; Brot-Goldberg et al. 2017), and demonstrate divergence from traditional unit economics through a worked example. The contribution is operational rather than theoretical: not a new theorem, but vocabulary and tools currently absent from cs.LG/stat.ML practice.