Large Language Model
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
A primary impediment to scaling reinforcement learning (RL) for large language model (LLM) training is the substantial computational cost, predominantly arising from the necessity of multi-sampling for policy optimization and evaluation. This underscores the critical yet challenging nature of efficient training data selection. Drawing inspiration from the Zone of Proximal Development (ZPD) theory, which posits that learners acquire knowledge more effectively from tasks of intermediate difficulty, we hypothesize that LLMs exhibit optimal learning from data they have not yet mastered but demonstrate the potential to comprehend. Conventional methodologies for assessing data difficulty or informativeness typically rely on computationally intensive multi-sampling or iterative procedures. To address this limitation, we introduce UFO-RL (**U**ncertainty-**F**ocused **O**ptimization for **R**einforcement **L**earning), a novel framework that employs a computationally efficient single-pass uncertainty estimation technique to identify informative training instances. This method, requiring only a single forward pass and obviating the need for iterative next-token computation, achieves a significant acceleration (up to 185$\times$) in data evaluation compared to multi-sampling approaches. UFO-RL leverages this efficient metric to select data within the model's estimated ZPD for training. Extensive experimentation across diverse LLMs and mathematical benchmarks demonstrates that training with a mere 10\% of the data, carefully selected by UFO-RL, yields performance comparable to or even surpassing that of full-data training.
Post Hoc Regression Refinement via Pairwise Rankings
Accurate prediction of continuous properties is essential to many scientific and engineering tasks. Although deep-learning regressors excel with abundant labels, their accuracy deteriorates in data-scarce regimes. We introduce RankRefine, a model-agnostic, plug-and-play post-hoc refinement technique that injects expert knowledge through pairwise rankings. Given a query item and a small reference set with known properties, RankRefine combines the base regressor's output with a rank-based estimate via inverse-variance weighting, requiring no retraining. In molecular property prediction task, RankRefine achieves up to 10\% relative reduction in mean absolute error using only 20 pairwise comparisons obtained through a general-purpose large language model (LLM) with no finetuning. As rankings provided by human experts or general-purpose LLMs are sufficient for improving regression across diverse domains, RankRefine offers practicality and broad applicability, especially in low-data settings.
SALS: Sparse Attention in Latent Space for KV Cache Compression
Large Language Models (LLMs) capable of handling extended contexts are in high demand, yet their inference remains challenging due to substantial Key-Value (KV) cache size and high memory bandwidth requirements. Previous research has demonstrated that KV cache exhibits low-rank characteristics within the hidden dimension, suggesting the potential for effective compression. However, due to the widely adopted Rotary Position Embedding (RoPE) mechanism in modern LLMs, naive low -rank compression suffers severe accuracy degradation or creates a new speed bottleneck, as the low-rank cache must first be reconstructed in order to apply RoPE. In this paper, we introduce two key insights: first, the application of RoPE to the key vectors increases their variance, which in turn results in a higher rank; second, after the key vectors are transformed into the latent space, they largely maintain their representation across most layers. Based on these insights, we propose the Sparse Attention in Latent Space (SALS) framework.
Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving
When serving a single base LLM with several different LoRA adapters simultaneously, the adapters cannot simply be merged with the base model's weights as the adapter swapping would create overhead and requests using different adapters could not be batched. Rather, the LoRA computations have to be separated from the base LLM computations, and in a multi-device setup the LoRA adapters can be sharded in a way that is well aligned with the base model's tensor parallel execution, as proposed in S-LoRA. However, the S-LoRA sharding strategy encounters some communication overhead, which may be small in theory, but can be large in practice. In this paper, we propose to constrain certain LoRA factors to be block-diagonal, which allows for an alternative way of sharding LoRA adapters that does not require any additional communication for the LoRA computations. We demonstrate in extensive experiments that our block-diagonal LoRA approach is similarly parameter efficient as standard LoRA (i.e., for a similar number of parameters it achieves similar downstream performance) and that it leads to significant end-to-end speed-up over S-LoRA. For example, when serving on eight A100 GPUs, we observe up to 1.79x (1.23x) end-to-end speed-up with 0.87x (1.74x) the number of adapter parameters for Llama-3.1-70B, and up to 1.63x (1.3x) end-to-end speed-up with 0.86x (1.73x) the number of adapter parameters for Llama-3.1-8B.
An AI solution to an 80‑year‑old problem has shocked mathematicians
Last week, OpenAI shocked the mathematical community by revealing that one of its internal artificial intelligence (AI) models had found a counterexample to a famous conjecture made by legendary Hungarian mathematician Paul Erdős in 1946. The planar unit distance problem, or Erdős problem 90, has intrigued mathematicians for decades. The new result is no mere curiosity. Canadian mathematician Daniel Litt described it as "the first result produced autonomously by an AI that I find interesting in itself". The breakthrough, produced with a general-purpose AI model rather than one specialised for mathematics, also highlights how AI is changing mathematical research itself.
OpenAI makes move to go public one week after rival Anthropic
OpenAI, founded in San Francisco in 2015 as a nonprofit research lab, burst into the mainstream with the launch of ChatGPT in November 2022. It has since restructured as a for-profit corporation. SAN FRANCISCO, UNITED STATES - ChatGPT-maker OpenAI on Monday took the first step toward going public, one week after archrival Anthropic announced its own filing, as both companies look to raise the massive sums needed to expand. In a social media post, the Sam Altman-led company said it had confidentially submitted an S-1 registration statement to U.S. securities regulators but had "not decided on timing yet" for any potential debut. OpenAI's move follows a confidential filing by Anthropic, the maker of the Claude chatbot, which announced last Monday that it had taken the same step. In a time of both misinformation and too much information, quality journalism is more crucial than ever.
OpenAI files SEC paperwork to go public
We expect it to leak so we're just announcing it. Exactly a week after Anthropic announced its plan to go public, OpenAI has followed suit. The company said on Monday that it confidentially submitted a S-1 form with the Securities and Exchange Commission. No date or offer price has been set by OpenAI yet for the initial public offering. We recently submitted a confidential S-1. We expect it to leak so we're just announcing it.
Google cuts the price of its AI Plus plan and doubles the storage
The subscription now starts at $5 per month. Google is lowering the cost of its cheapest AI subscription to make Gemini models even easier to access. The Google AI Plus plan will now cost $5 per month, according to a post from Vikas Kansal, the company's Product Lead focused on Gemini AI subscriptions, down from its original $8 per month price. It now also comes with double the storage, 400GB instead of 200GB. The subscription plan became available in January 2026 as a cheaper way to access Google's Gemini 3 Pro model, Nano Banana Pro and Deep Research.
You don't need to worry about recursive-self-improving AI – yet
You don't need to worry about recursive-self-improving AI - yet One of the world's leading artificial intelligence companies has implored the industry to pause development on AI, because the latest models could be reaching a tipping point where they become capable of redesigning themselves, growing ever more powerful and finally escaping our control. At least, that's what the headlines said. In truth, Anthropic's co-founder Jack Clark and the boss of spin-out think-tank The Anthropic Institute, Marina Favaro, have published a long blog post bigging up the capabilities of their Claude model, shortly before the company floats on the stock exchange in an initial public offering (IPO) for a rumoured $1 trillion. Let's, for a moment, ignore the vast financial elephant in the room and look at the technological claims. An AI that becomes capable of designing a more powerful version of itself, which is in turn able to pull off the same feat, is an obvious gamechanger, but it is also not a new idea.
The Download: how the World Cup ball will fly and OpenAI's "super app"
The Download: how the World Cup ball will fly and OpenAI's "super app" Plus: OpenAI plans to turn ChatGPT into a'super app' before its IPO. Why this year's World Cup ball may not fly as far Much is new about this month's FIFA World Cup tournament. It hosts more teams than ever before. It's the first to occur in three different host countries. And, like every World Cup for over half a century, it will employ a football with a brand-new design. Through wind-tunnel experiments, researchers found that long-distance kicks with Adidas's new Trionda ball might not travel as far as they did in the past.