Large Language Model
The Infinity Machine by Sebastian Mallaby review – the story of the man who changed the world
I t was March 2016, and at the Four Seasons Hotel in Seoul, the world was gathered to watch the culmination of a battle 2,500 years in the making. On one side was the South Korean Lee Se-dol, the second-highest ranking Go player in the world. On the other was AlphaGo - a computer program developed by London-based artificial intelligence research company DeepMind. "Chess is the greatest game mankind has invented," game designer Alex Randolph once said. "Go is the greatest game mankind has discovered."
Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection
Basu, Abhinaba, Chakraborty, Pavan
Scientific discovery increasingly relies on AI systems to select candidates for expensive experimental validation, yet no principled, budget-aware evaluation framework exists for comparing selection strategies -- a gap intensified by large language models (LLMs), which generate plausible scientific proposals without reliable downstream evaluation. We introduce the Budget-Sensitive Discovery Score (BSDS), a formally verified metric -- 20 theorems machine-checked by the Lean 4 proof assistant -- that jointly penalizes false discoveries (lambda-weighted FDR) and excessive abstention (gamma-weighted coverage gap) at each budget level. Its budget-averaged form, the Discovery Quality Score (DQS), provides a single summary statistic that no proposer can inflate by performing well at a cherry-picked budget. As a case study, we apply BSDS/DQS to: do LLMs add marginal value to an existing ML pipeline for drug discovery candidate selection? We evaluate 39 proposers -- 11 mechanistic variants, 14 zero-shot LLM configurations, and 14 few-shot LLM configurations -- using SMILES representations on MoleculeNet HIV (41,127 compounds, 3.5% active, 1,000 bootstrap replicates) under both random and scaffold splits. Three findings emerge. First, the simple RF-based Greedy-ML proposer achieves the best DQS (-0.046), outperforming all MLP variants and LLM configurations. Second, no LLM surpasses the Greedy-ML baseline under zero-shot or few-shot evaluation on HIV or Tox21, establishing that LLMs provide no marginal value over an existing trained classifier. Third, the proposer hierarchy generalizes across five MoleculeNet benchmarks spanning 0.18%-46.2% prevalence, a non-drug AV safety domain, and a 9x7 grid of penalty parameters (tau >= 0.636, mean tau = 0.863). The framework applies to any setting where candidates are selected under budget constraints and asymmetric error costs.
Self-Retrieval: End-to-End InformationRetrieval withOneLargeLanguageModel
The rise of large language models (LLMs) has significantly transformed both the construction and application of information retrieval (IR) systems. However, current interactions between IR systems and LLMs remain limited, with LLMs merely serving as part of components within IR systems, and IR systems being constructed independently of LLMs. This separated architecture restricts knowledge sharing and deep collaboration between them. In this paper, we introduce Self-Retrieval, a novel end-to-end LLM-driven information retrieval architecture.
OpenAI reportedly plans to add Sora video generation to ChatGPT
The company launched its Sora 2 model in September 2025 alongside a dedicated Sora app. OpenAI plans to add its Sora video generation model directly into ChatGPT, reports . The standalone Sora app was seen as a smash hit when it launched alongside Sora 2 in September 2025, but interest in the video generation app has fallen in the time since as users ran into limits on the amount and kinds of videos they could create. Adding Sora to the ChatGPT could give the model a second life, and ideally grow the ChatGPT app's weekly active users from the 900 million OpenAI reported in February, to a billion or more. According to, the standalone Sora app will stick around after the model is integrated, even though the app has fallen out of the App Store's top 100 free apps and only a small number of users reportedly share their videos publicly in the app.
AIhub coffee corner: AI, kids, and the future – "generation AI"
This month we tackle the topic of young people and what AI tools mean for their future. Joining the conversation this time are: Sanmay Das (Virginia Tech), Tom Dietterich (Oregon State University), Sabine Hauert (University of Bristol), Michael Littman (Brown University), and Ella Scallan (AIhub). As AI tools have become ubiquitous, we've seen growing concern and increasing coverage about how the use of such tools from a formative age might affect children. What do you think the impact will be and what skills might young people need to navigate this AI world? I met up with a bunch of high school friends when I was last in Switzerland and they were all wondering what their kids should study. They were wondering if they should do social science, seeing as AI tools have become adept at many tasks, such as coding, writing, art, etc. I think that we need social sciences, but that we also need people who know the technology and who can continue developing it. I say they should continue doing whatever they're interested in and those jobs will evolve and they'll look different, but there will still be a whole wealth of different types of jobs.
China's OpenClaw Boom Is a Gold Rush for AI Companies
China's OpenClaw Boom Is a Gold Rush for AI Companies Hype around the open source agent is driving people to rent cloud servers and buy AI subscriptions just to try it, creating a windfall for tech companies. George Zhang thought OpenClaw could make him rich, even though he didn't really understand how the viral AI agent software worked. But he saw a video of a Chinese social media influencer demonstrating how it could be deployed to manage stock portfolios and make investment decisions autonomously. Zhang, who works in cross-border ecommerce in the Chinese city of Xiamen, was intrigued enough that he decided to try installing OpenClaw in late February. Zhang is one of the many people in China who got swept up in the craze over OpenClaw recently.