Government
'History won't forgive us' if UK falls behind in quantum computing race, says Tony Blair
Tony Blair: 'As we have seen with AI, it is the countries that have the infrastructure and capital for scale that capture technology's economic and strategic benefits.' Tony Blair: 'As we have seen with AI, it is the countries that have the infrastructure and capital for scale that capture technology's economic and strategic benefits.' 'History won't forgive us' if UK falls behind in quantum computing race, says Tony Blair Tony Blair has said "history won't forgive us" if the UK falls behind in the race to harness quantum computing, a frontier technology predicted to trigger the next wave of breakthroughs in everything from drug design to climate modelling. The former British Labour prime minister, whose thinktank and consultancy, the Tony Blair Institute, is backed by tech industry leaders including the Oracle founder, Larry Ellison, warned: "The country risks failing to convert its leadership in quantum research." In a report calling for a national strategy for quantum computing, Blair and William Hague, a former Conservative party leader, compared the situation to the recent history of artificial intelligence, where the UK was responsible for important research breakthroughs but then ceded power to other countries, including the US, leading to a scramble to build "sovereign" AI capacity.
In Grok we don't trust: academics assess Elon Musk's AI-powered encyclopedia
Users have found that Grokipedia lifts large chunks from Wikipedia, contains numerous factual errors and promotes Musk's favoured rightwing talking points. Users have found that Grokipedia lifts large chunks from Wikipedia, contains numerous factual errors and promotes Musk's favoured rightwing talking points. In Grok we don't trust: academics assess Elon Musk's AI-powered encyclopedia T he eminent British historian Sir Richard Evans produced three expert witness reports for the libel trial involving the Holocaust denier David Irving, studied for a doctorate under the supervision of Theodore Zeldin, succeeded David Cannadine as Regius professor of history at Cambridge (a post endowed by Henry VIII) and supervised theses on Bismarck's social policy. That was some of what you could learn from Grokipedia, the AI-powered encyclopedia launched last week by the world's richest person, Elon Musk . The problem was, as Prof Evans discovered when he logged on to check his own entry, all these facts were false.
China intimidated UK university to ditch human rights research, documents show
China waged a campaign of harassment and intimidation directed at a UK university to get it to shut down sensitive research into alleged human rights abuses, documents seen by the BBC show. Sheffield Hallam University staff in China were threatened by individuals described by them as being from China's National Security Service who demanded the research being done in Sheffield be halted. And access to the university's websites from China was blocked, impeding its ability to recruit Chinese students, in a campaign of threats and intimidation lasting more than two years. In an internal email from July 2024, university officials said attempting to retain the business in China and publication of the research are now untenable bedfellows. When the UK government learned of the case, the then Foreign Secretary David Lammy issued a warning to his Chinese counterpart that it would not tolerate attempts to suppress academic freedoms at UK universities, the BBC understands.
Ukrainian computer game-style drone attack system goes 'viral'
Drone teams competing for points under the'Army of Drones Bonus System' killed or wounded 18,000 Russian soldiers in September. Drone teams competing for points under the'Army of Drones Bonus System' killed or wounded 18,000 Russian soldiers in September. Ukrainian computer game-style drone attack system goes'viral' A computer game-style drone attack system has gone "viral" among Ukrainian military units and is being extended to reconnaissance, artillery and logistics operations, the nation's first deputy prime minister, Mykhailo Fedorov, has told the Guardian. Drone teams competing for points under the "Army of Drones Bonus System" killed or wounded 18,000 Russian soldiers in September, with 400 drone units now taking part in the competition, up from 95 in August, Ukrainian officials said. The system, which launched more than a year ago, rewards soldiers who achieve strikes with points that can be exchanged to buy more weapons in an "Amazon-for-war" online store called Brave1 filled with more than 100 different drones, autonomous vehicles and other drone war material.
I built this 'AI aunt' for women after family tragedy in South Africa
I built this'AI aunt' for women after family tragedy in South Africa A gruesome killing in her own family inspired South African Leonora Tima to create a digital platform where people, mostly women, can talk about and track abuse. Leonora's relative was just 19 years old, and nine months pregnant, when she was killed, her body dumped on the side of a highway near Cape Town in 2020. I work in the development sector, so I've seen violence, Leonora says. But what stood out for me was that my family member's violent death was seen as so normal in South African society. Her death wasn't published by any news outlet because the sheer volume of these cases in our country is such that it doesn't qualify as news.
VeriFastScore: Speeding up long-form factuality evaluation
Rajendhran, Rishanth, Zadeh, Amir, Sarte, Matthew, Li, Chuan, Iyyer, Mohit
Metrics like FactScore and VeriScore that evaluate long-form factuality operate by decomposing an input response into atomic claims and then individually verifying each claim. While effective and interpretable, these methods incur numerous LLM calls and can take upwards of 100 seconds to evaluate a single response, limiting their practicality in large-scale evaluation and training scenarios. To address this, we propose VeriFastScore, which leverages synthetic data to fine-tune Llama3.1 8B for simultaneously extracting and verifying all verifiable claims within a given text based on evidence from Google Search. We show that this task cannot be solved via few-shot prompting with closed LLMs due to its complexity: the model receives ~4K tokens of evidence on average and needs to concurrently decompose claims, judge their verifiability, and verify them against noisy evidence. However, our fine-tuned VeriFastScore model demonstrates strong correlation with the original VeriScore pipeline at both the example level (r=0.80) and system level (r=0.94) while achieving an overall speedup of 6.6x (9.9x excluding evidence retrieval) over VeriScore. To facilitate future factuality research, we publicly release our VeriFastScore model and synthetic datasets.
Red Teaming AI Red Teaming
Majumdar, Subhabrata, Pendleton, Brian, Gupta, Abhishek
Red teaming has evolved from its origins in military applications to become a widely adopted methodology in cybersecurity and AI. In this paper, we take a critical look at the practice of AI red teaming. We argue that despite its current popularity in AI governance, there exists a significant gap between red teaming's original intent as a critical thinking exercise and its narrow focus on discovering model-level flaws in the context of generative AI. Current AI red teaming efforts focus predominantly on individual model vulnerabilities while overlooking the broader sociotechnical systems and emergent behaviors that arise from complex interactions between models, users, and environments. To address this deficiency, we propose a comprehensive framework operationalizing red teaming in AI systems at two levels: macro-level system red teaming spanning the entire AI development lifecycle, and micro-level model red teaming. Drawing on cybersecurity experience and systems theory, we further propose a set of six recommendations. In these, we emphasize that effective AI red teaming requires multifunctional teams that examine emergent risks, systemic vulnerabilities, and the interplay between technical and social factors.
RADAR: Benchmarking Language Models on Imperfect Tabular Data
Gu, Ken, Zhang, Zhihan, Lin, Kate, Zhang, Yuwei, Paruchuri, Akshay, Yu, Hong, Kazemi, Mehran, Ayush, Kumar, Heydari, A. Ali, Xu, Maxwell A., Narayanswamy, Girish, Liu, Yun, Poh, Ming-Zher, Yang, Yuzhe, Malhotra, Mark, Patel, Shwetak, Palangi, Hamid, Xu, Xuhai, McDuff, Daniel, Althoff, Tim, Liu, Xin
Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2980 table query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning.
Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models
Liang, Yuchen, Huang, Renxiang, Lai, Lifeng, Shroff, Ness, Liang, Yingbin
Discrete state space diffusion models have shown significant advantages in applications involving discrete data, such as text and image generation. It has also been observed that their performance is highly sensitive to the choice of rate matrices, particularly between uniform and absorbing rate matrices. While empirical results suggest that absorbing rate matrices often yield better generation quality compared to uniform rate matrices, existing theoretical works have largely focused on the uniform rate matrices case. Notably, convergence guarantees and error analyses for absorbing diffusion models are still missing. In this work, we provide the first finite-time error bounds and convergence rate analysis for discrete diffusion models using absorbing rate matrices. We begin by deriving an upper bound on the KL divergence of the forward process, introducing a surrogate initialization distribution to address the challenge posed by the absorbing stationary distribution, which is a singleton and causes the KL divergence to be ill-defined. We then establish the first convergence guarantees for both the $τ$-leaping and uniformization samplers under absorbing rate matrices, demonstrating improved rates over their counterparts using uniform rate matrices. Furthermore, under suitable assumptions, we provide convergence guarantees without early stopping. Our analysis introduces several new technical tools to address challenges unique to absorbing rate matrices. These include a Jensen-type argument for bounding forward process convergence, novel techniques for bounding absorbing score functions, and a non-divergent upper bound on the score near initialization that removes the need of early-stopping.