Large Language Model
Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity
Lochab, Anamika, Li, Bolian, Zhang, Ruqi
Reinforcement Learning with Verifiable Rewards (RLVR) has achieved substantial gains in single-attempt accuracy (Pass@1) on reasoning tasks, yet often suffers from reduced multi-sample coverage (Pass@K), indicating diversity collapse. We identify a structural cause for this degradation: common RLVR objectives, such as GRPO, are indifferent to how probability mass is distributed among correct solutions. Combined with stochastic training dynamics, this indifference induces a self-reinforcing collapse, in which probability mass concentrates on a narrow subset of correct outputs while alternative valid solutions are suppressed. We formalize this collapse mechanism and further characterize the optimal policy structure under two complementary criteria: robustness and entropy-regularized optimality, which identify the Uniform-Correct Policy as uniquely optimal. Motivated by this analysis, we propose Uniform-Correct Policy Optimization (UCPO), a modification to GRPO that adds a conditional uniformity penalty on the policy's distribution over correct solutions. The penalty redistributes gradient signal toward underrepresented correct responses, encouraging uniform allocation of probability mass within the correct set. Across three models (1.5B-7B parameters) and five mathematical reasoning benchmarks, UCPO improves Pass@K and diversity while maintaining competitive Pass@1, achieving up to +10\% absolute improvement on AIME24 at Pass@64 and up to 45\% higher equation-level diversity within the correct set. The code is available at https://github.com/AnamikaLochab/UCPO.
UK 'invention agency' grants 50m of public money to US tech and venture capital firms
OpenAI's Sam Altman, left, is a backer of Rain Neuromophics, one of the companies that received funds from the UK's Aria, the brainchild of Dominic Cummings, right OpenAI's Sam Altman, left, is a backer of Rain Neuromophics, one of the companies that received funds from the UK's Aria, the brainchild of Dominic Cummings, right Exclusive: Brainchild of Dominic Cummings, Aria is aimed at funding'crazy' scientific projects to benefit the UK Britain's "invention agency" has pledged ยฃ50m of UK taxpayer money to US tech companies and venture capital projects. Dreamed up by Dominic Cummings to fund "crazy" ideas, the Advanced Research and Invention Agency (Aria) is meant to " restore Britain's place as a scientific superpower ". But a joint investigation by the Guardian and Democracy for Sale, an investigative website, has established that more than an eighth of the agency's ยฃ400m in research and development funding over the past two years has gone to 14 US tech companies and venture capital groups, in some cases, with no clear return for the UK or Aria. One of these companies, Rain Neuromorphics, is also backed by the OpenAI chief executive, Sam Altman, and was reported to be near collapse last year, shortly after winning Aria money. It did not respond to a request for comment; two of its founders appear to have left the company.
OpenAI introduces AI-generated pets for its Codex app
Vibe coding just got a whole lot more adorable. OpenAI introduced AI-generated pets to the Codex app, its agentic tool that helps with coding. These optional animated companions don't do any coding themselves, but serve as a floating overlay that can tell you what Codex is working on, notify you when Codex completes a task or whether it needs your input on something. The new feature lets developers see Codex's active thread, without having to switch away from your current open app. Users can type /pet in to the Codex app to summon or dismiss the companion.
Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI's models
Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI's models Musk kept his cool, and OpenAI's lawyer bulldozed him with piercing questions about his motivations for suing the company. In the first week of the landmark trial between Elon Musk and OpenAI, Musk took the stand in a crisp black suit and tie and argued that OpenAI CEO Sam Altman and president Greg Brockman had deceived him into bankrolling the company. Along the way, he warned that AI could destroy us all and sat through revelations that he had poached OpenAI employees for his own companies. He even confessed, to some audible gasps in the courtroom, that his own AI company, xAI, which makes the chatbot Grok, uses OpenAI's models to train its own. The federal courthouse in Oakland, California, was packed with armies of lawyers carrying boxes of exhibits, journalists typing away at their laptops, and a handful of concerned OpenAI employees. Outside, protesters lined the streets, carrying signs urging people to quit ChatGPT, boycott Tesla, or both.
OpenAI Enables Marketing Cookies by Default for Free ChatGPT Users
ChatGPT's new privacy policy states how the company uses cookies for tracking, to turn free users into paying subscribers. OpenAI is ready to target free users of its services with advertisements around the web, based on what it knows about them. On Thursday, OpenAI sent an email to users laying out major changes to the AI company's privacy policy in the US. "We'll now use cookies to promote OpenAI products and services on other websites," reads the email sent on April 30. "This does not impact your conversations in ChatGPT. Your conversations with ChatGPT are private and are not shared with marketing partners."
Pentagon says US military to be an 'AI-first' fighting force
Pentagon says US military to be an'AI-first' fighting force The US military plans to increase its use of artificial intelligence (AI) further after the Pentagon agreed to new and expanded contracts with some of the biggest names in technology. Under eight agreements with Google, OpenAI, Amazon, Microsoft, SpaceX, Oracle, Nvidia and the start-up Reflection, the Pentagon said AI technology would now be used for any lawful operational use. These agreements accelerate the transformation [of] the US military as an AI-first fighting force, the Pentagon said. Conspicuous by its absence is Anthropic, as the company has said it is concerned about how the Pentagon could use its tools in warfare and domestically. The firm is now suing the government over the alleged retaliation it faced after refusing to accept any lawful use language in its own contract.
A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat
Build American AI, a nonprofit linked to a super PAC bankrolled by executives at OpenAI and Andreessen Horowitz, is funding a campaign to spread pro-AI messaging and stoke fears about China. In an Instagram video posted on April 1, lifestyle influencer Melissa Strahle poses outdoors before an American flag as soft instrumental music plays. "AI lets me focus on what matters most," she tells her 1.4 million followers. "We need to invest in American-made AI to ensure America leads the way in innovation and job creation." Strahle labeled the post an advertisement, but she didn't disclose what organization had paid for it.
The 20 AI subscription era has become untenable
PCWorld reports that current $20 flat-rate AI subscriptions from OpenAI, Anthropic, and others are becoming financially unsustainable for providers. GitHub Copilot has already switched to expensive usage-based pricing, while Anthropic considers removing advanced features from Claude Pro plans. Users should expect significant price increases as the true cost of powerful AI agents far exceeds current subscription fees.
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains - distribution shift. In this work, we explicitly handle this problem by aligning the out-of-distribution (OOD) test sample statistics to those of the source data using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Evaluating against the domain generalization benchmark, our method improves zero-shot top1 accuracy beyond existing prompt-learning techniques, with a 3.08%improvement over the baseline MaPLe. In cross-dataset generalization with unseen categories across 10 datasets, our method improves consistently across all datasets compared to the existing state-of-the-art.