Country
Russian drone crashes into apartment building in Romania
A Russian drone hit an apartment building in Romania, the country's defence ministry said early on Friday, causing a fire and injuring two people. The drone crashed in the eastern city of Galati as Russia carried out attacks in Ukraine near the border, the ministry said in a statement. The Romanian General Inspectorate for Emergency Situations said the drone's entire explosive payload detonated, causing a fire on the 10th floor of the residential building. Russian drones have strayed across the border of the Nato member country a number of times during the four-year war with Ukraine, but this was the first time citizens from Romania had been hurt. Russia has yet to comment on the incident. This incident represents a serious and irresponsible escalation on the part of the Russian Federation, Romania's foreign ministry said, adding Bucharest had informed the Nato secretary general and requested measures to accelerate the transfer of anti-drone capabilities to Romania.
Anthropic soars to 965bn valuation, leapfrogging OpenAI
Anthropic has usurped OpenAI as the world's most valuable artificial intelligence startup, soaring to a $965bn valuation ahead of expected public listings by the rival firms. Anthropic, the maker of the Claude family of chatbots, said on Thursday that it had raised $65bn from private investors after a fundraising round led by Altimeter Capital, Greenoaks, Dragoneer and Sequoia Capital. "This funding will help us serve the historic demand we are experiencing, stay at the research frontier, and bring Claude to more of the places where work happens," Anthropic's Chief Financial Officer Krishna Rao said in a statement. Altimeter Capital CEO Brad Gerstner hailed the adoption of Claude among the "world's most demanding organisations" as evidence of Anthropic's command in the field. "This momentum positions Anthropic to lead the next phase of AI innovation and capture the enormous opportunity ahead," Gerstner said.
Taiyo Yuden sees 'scary' levels of AI parts demand risking supply chain
Taiyo Yuden sees'scary' levels of AI parts demand risking supply chain Multilayer ceramic capacitors, which are tiny components that regulate and stabilize power flow in electronic devices, are becoming a growing bottleneck in the construction of artificial intelligence data centers. Taiyo Yuden is fielding "scary" levels of demand for its high-end artificial intelligence server components, stretching capacity and increasing the risk of supply chain hiccups. The Tokyo-based company, which makes multilayer ceramic capacitors, will likely need to accelerate spending to expand output capacity, Chief Executive Officer Katsuya Sase said in an interview. MLCCs, which are tiny components that regulate and stabilize power flow in electronic devices, are becoming a growing bottleneck in the construction of artificial intelligence data centers. Taiyo Yuden and Murata Manufacturing comprise the bulk of the world's supplies of high-end MLCCs. "The volumes we are seeing today -- it's scary," Sase said.
BYD debuts China's most advanced EV chip in smart-driving push
BYD debuts China's most advanced EV chip in smart-driving push BYD on Thursday unveiled what it calls China's first automotive-grade 4-nanometer chip for self-driving cars. BYD, the world's largest electric vehicle maker, unveiled a series of technology advances, including what it calls China's first automotive-grade 4-nanometer chip for self-driving cars. The semiconductor breakthrough approaches the lead of Chinese tech giant Huawei Technologies, which currently makes chips with a geometry of 7 nm but has pledged to debut 1.4 nm chips by 2031. It's designed to allow BYD's computer-assisted driving to stand out from a crowded Chinese EV market that includes rivals such as Xpeng and Xiaomi. Facing eight months in a row of falling sales and intense competition for more advanced charging and intelligent driving technologies, BYD is looking to spark more demand for its vehicles.
Anthropic reaches near-trillion dollar valuation, topping OpenAI
Anthropic's rise came by doubling down on delivering generative artificial intelligence to enterprise clients rather than general users. Artificial intelligence company Anthropic said Thursday it had raised $65 billion in a new funding round that values the Claude maker at $965 billion, more than its archrival OpenAI, the maker of ChatGPT. The latest fundraising round confirms Anthropic's place as one of the most significant players in AI, with the startup led by Dario Amodei having drawn fans for its coding powers and state-of-the-art models. Anthropic's rise came by doubling down on delivering generative AI to enterprise clients rather than general users, the path initially chosen by OpenAI. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right.
Conf-Gen: Conformal Uncertainty Quantification for Generative Models
Loaiza-Ganem, Gabriel, Zhang, Kevin, Cui, Wei, Law, Marc T., Leung, Kin Kwan
Conformal prediction (CP) and its extension, conformal risk control (CRC), are established frameworks for quantifying uncertainty in supervised machine learning through formal guarantees. However, recent breakthroughs in artificial intelligence (AI) have been driven by unsupervised generative models, such as large language models (LLMs) and image generators, which are not directly compatible with CP or CRC. In this work we introduce conformal generation (Conf-Gen), a general framework adapting CRC to generative tasks while relaxing its theoretical assumptions. Conf-Gen unifies and generalizes previous attempts to apply CP to LLMs, and extends conformal methodology to entirely new domains. We demonstrate the flexibility of Conf-Gen through some novel applications, including obtaining conformal guarantees on: image generators producing non-memorized images, conversational AI systems having asked enough clarifying questions, and the output of AI agents being correct.
Anytime-Valid Federated Conformal RAG for LLM Swarms
Dubey, Prasanjit, Huo, Xiaoming
Federated Conformal RAG (FC-RAG) provides distribution-free coverage for a bandwidth-limited swarm of weak language models, but only at a fixed horizon. We extend it to anytime-valid sequential coverage: validity at every stopping time, preserved under predictable adaptive control (recalibration, per-node bandwidth escalation, distilled-student refresh), at no extra cost in assumptions over fixed-horizon FC-RAG. Naive composition fails because FC-RAG's marginal coverage bound makes the betting e-process a non-supermartingale on adverse calibration draws, and Ville's inequality cannot be invoked. We give Anytime-FC-RAG, a sequential extension built on a summable per-step calibration-deviation budget that converts the marginal bound into a strict conditional bound on a calibration-good event, paired with a truncated betting e-process that is a nonnegative supermartingale on the entire probability space. From these two ingredients, we obtain four guarantees: time-uniform alarm validity $\mathbb{P}(\sup_t E_t \ge 1/δ_e) \le δ_e + δ_{\mathrm{cal}}$, a Hoeffding-stitched cumulative-miscoverage envelope at the same total budget, safety under any predictable controller (recalibration, bandwidth escalation, student refresh), and training-side error propagation across an unbounded sequence of Federated Probe-Logit Distillation (FPLD) refreshes via a summable training budget. As a practical consequence, an adaptive controller that escalates retrieval bandwidth only when the e-process crosses a warning threshold matches the alarm rate of a fixed-high-bandwidth schedule at substantially lower communication cost. Experiments on a GPT-2-small + MiniLM swarm across MMLU, DBpedia, and AG News verify the predicted alarm rate, detection delay, envelope coverage, and $14$-$57\%$ bandwidth savings; the alarm fires when and only when coverage genuinely breaks.
Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning
Cesari, Tommaso, Colomboni, Roberto
We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For $K$ actions, losses in $[0,1]$, and a unique best action separated from the second-best action by gap $Δ_{\min}$, the known lower bound is of order $ \frac{\log K}{\min\{Δ_{\min},\varepsilon\}}, $ or equivalently, up to universal constants, of order \[ \frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ \operatorname{Reg}_T \le 1000 \cdot \left(\frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}\right) \] for every horizon $T$. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost $\log K/\varepsilon$.
Do Deep Networks Forget Initialization? A Forgetting-Time View of Practical Inductive Bias
Das, Mohua, Beneventano, Pierfrancesco, Dey, Shibshankar, McKinkey, Gareth H., Poggio, Tomaso
Randomly initialized neural networks induce a prior over functions, but the predictor used in practice is produced only after training. We ask how much of this initial bias survives the training pipeline. To make the question measurable, we introduce initialization memory: the dependence of the validation-selected predictor on the scale of the random initialization. We perform controlled CIFAR-10 experiments on ResNets where initialization memory already sharply separates training regimes. Low-learning-rate SGD can interpolate while still remembering its initialization: on ResNet-9 with batch size $b=128$, test accuracy varies by $26.5$ percentage points across initialization scales despite $\ge99.5\%$ training accuracy. This is not undertraining: extending the same low-learning-rate regime to $5{,}000$ epochs leaves the spread essentially unchanged. In contrast, Adam-family methods largely erase the dependence. SGD can also be made to forget when larger learning rates are paired with explicit $L_2$ norm control. We interpret these findings in terms of the time scale of forgetting: gradient-flow-like dynamics can preserve initialization memory, whereas stochastic finite-step effects, explicit norm decay, and adaptive preconditioning erase it on scales governed by the size of explicit or implicit regularization. The practical inductive bias of a trained network is therefore not the architectural prior alone, but the architectural prior after being filtered by the forgetting dynamics of the training pipeline; and the same regularizers that improve generalization are precisely those that erase memory of initialization.
The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction
Wan, Shu, Gorantla, Abhinav, Liu, Huan, Candan, K. Selçuk
Under standard graphical assumptions, the Markov boundary of a target variable is the smallest set of features that renders every other feature redundant. Once the boundary is observed, the target is conditionally independent of the rest of the table. This is a tempting object for tabular prediction, since it names exactly the columns a model should need. Yet modern regressors are still trained on the full feature set. We ask whether the Markov boundary is genuinely useful for prediction on SCM3K, a 3,450-task synthetic SCM benchmark with feature counts from 40 to 1000 and six SCM families, evaluated with six regressors. The answer is more nuanced than the theory suggests. Restricting a regressor to the oracle boundary often improves prediction substantially, and the improvement grows as the feature space becomes larger and sparser. But the natural pipeline of recovering the boundary with causal discovery and training on the recovered mask does not deliver. Existing estimators exhaust the compute budget before reaching the regime where the boundary helps most, and even where they run they rarely beat the full feature set. We trace this to three causes. Discovery optimizes structural recovery rather than prediction. False negatives and false positives carry sharply asymmetric predictive cost. The exact boundary is only one of many feature sets that beat all features. We then develop what these facts imply for prediction-aligned feature selection and for tabular models that learn to use causal structure.