Genre
Deployment-complete benchmarking
Mansouri, El Mustapha, Arai, Keigo
Benchmarks increasingly guide deployment, procurement and scientific screening, yet a score supports only the response it records, not necessarily the deployment action. We introduce deployment-complete benchmarking, which tests whether benchmark evidence determines a deployment action. A benchmark is complete for a claim exactly when the action is constant on each evidence fiber; mixed fibers expose missing deployment information, and completion curves quantify the evidence required to resolve ambiguity. In controlled response spaces, benchmark-channel conformal coverage of 94.98% transferred poorly to an unmeasured deployment channel (10.07%), whereas response-rank intervals achieved 94.91% coverage; even zero benchmark error certified only 45.4% of candidates at the largest residual size. Public audits revealed incompleteness, including 97.9% mixed Tox21 fibers and zero median certifiable fraction in main Matbench and JARVIS audits. In held-out replays, certify-then-acquire reduced false decisions from 1.19% to 0.027% in Tox21 and from 20.3% to 0.128% in JARVIS, while changing model choice and identifying deployment-relevant probes. Deployment-ready benchmarks should report evidence, supported actions, ambiguity and completion cost rather than scores alone.
Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance
Blanchet, Jose, Glynn, Peter, Yang, Wenhao
Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting distributions depend on unknown nuisance parameters. In this paper, we develop an efficient, model-agnostic methodology for constructing confidence regions from SGD trajectories that applies in both finite- and infinite-variance regimes. The procedure is based on a joint weak convergence result for the Polyak-Ruppert averaged estimator and an empirical second-moment normalizer constructed from stochastic gradients along the SGD trajectory. This joint limit yields a self-normalized statistic in which the leading tail-dependent scaling terms cancel. We then use a subsampling calibration scheme to estimate the relevant critical values, avoiding explicit estimation of tail indices, slowly varying functions, or stable-law parameters. The resulting confidence regions are straightforward to implement and are asymptotically valid under both the finite- and infinite-second-moment regimes. Simulation studies show reliable coverage in various settings, supporting the proposed method as a practical tool for uncertainty quantification in stochastic optimization.
DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking
Wiemann, Matt L., Smith, Lindsay M., Melchior, Peter, Mishra-Sharma, Siddharth, Wilson, Andrew Gordon, Izmailov, Pavel, Cuesta-Lázaro, Carolina
Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent to discover the laws of motion of a simulated world whose physics deliberately deviates from our own. We construct 22 worlds governed by, among others, screened and fractional-power gravity, multi-species couplings, hidden dark-matter-like particles, non-coordinate-free physics, and time-varying interactions. Each world is generated on demand by an N-body simulator, for which the agent proposes several rounds of experiments, observes raw trajectory data, and ultimately submits both a natural-language explanation of the world's physics and a Python implementation of the inferred law. Because solving a world requires the agent to design informative experiments and revise its hypotheses, the benchmark probes long-horizon reasoning over an experimental history. We evaluate submissions along two complementary axes: trajectory MSE on held-out particles and an LLM-judged explanation score following an expert-written rubric assessing conceptual understanding of each world. Across eleven frontier models, we find that the strongest agents pass only half of the worlds and consistently fail on those where latent structure must be uncovered. Open-source models lag substantially behind commercial models, both in their ability to design informative experiments and in extracting conclusions from the data. We further find that good predictive accuracy does not guarantee high explanation quality and that conceptual understanding depends on hypothesis refinement through well-chosen experiments.
Goal-driven Bayesian Optimal Experimental Design for Robust Decision-Making Under Model Uncertainty
Go, Jinwoo, Qian, Xiaoning, Yoon, Byung-Jun
Bayesian optimal experimental design (BOED) selects experiments to maximize information gain about model parameters. However, in decision-critical settings, reducing parameter uncertainty does not necessarily improve downstream decisions, as only specific parameter directions relevant to the objective truly matter. We propose GoBOED, a goal-driven BOED framework that directly optimizes experimental designs for a specified decision-making objective. GoBOED combines an amortized variational posterior surrogate with a differentiable convex decision layer, enabling gradient-based design optimization that is fully decision-focused. We theoretically show that GoBOED gradients are insensitive to parameter directions irrelevant to the decision objective, providing a formal justification for why goal-driven design achieves equivalent decision quality over a wider set of experimental designs than information-gain maximization. Empirically, across source localization, epidemic management, and pharmacokinetic control, GoBOED identifies designs that better align with downstream decision objectives and reveals that near-optimal design windows are substantially wider than those predicted by goal-agnostic BOED approaches.
Bobcat that survived being hit by a car gets a custom-built kennel
A generous donation and a good neighbor will help the wildcat continue her recovery. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. A new kennel and generous donation are giving this Pennsylvania bobcat a new life. Breakthroughs, discoveries, and DIY tips sent six days a week. In March, we reported on a wild bobcat that had been hit and dragged by a car, who also got her head stuck in the car's grill.
Scientists discover a third eye hidden in the human body and the reason it's there
Kyle Busch's widow revealed haunting plan to have his baby if he ever died - six months before NASCAR great's shock passing Wife goes scorched earth on cop husband with divorce filings so scandalous he has now lost his job, as family's perfect life is shattered These billion-dollar projects were sold as a green revolution for struggling communities. Megyn Kelly is torched by MAGA after she issued direct hit at Trump for'cheating on every wife he's had' I got addicted to the stimulant that Trump insiders are secretly using... it can obliterate your sexual performance and ruined my life Scientists discover a third eye hidden in the human body and the reason it's there Mother who abandoned her children blindfolded in Portuguese woods is sent to the country's toughest women's prison - as videos of her partner decrying'end of the world' emerge Harrowing map shows cancer explosion that'll make you put down your favorite drink... have you left it too late? Ozempic and Wegovy can lead to devastating muscle and bone loss... now experts reveal exactly how to fight it Fans go wild as Kyle Richards' forgotten role in ER resurfaces... and it's a long way from RHOBH I lost five pounds in six weeks when I discovered'Nature's Ozempic': All the benefits of the jabs with NONE of the side effects - and I just stir it into my morning coffee... by BEATRICE AIDIN The dangerously overdue Northeast hurricane we can't ignore: Catastrophic damage and biggest New England danger zone revealed by top forecaster China's answer to the Rolls-Royce: Self-parking, £130,000 18ft-long beast is packed with gadgets, a 40-inch screen and gold trim Tiger Woods breaks his silence after his'return to rehab' in Switzerland following brief reunion with Vanessa Trump America's best kept sex secret. This unassuming hotspot has women going wild for untamed lovers who know EXACTLY what they're doing: 'It's sex central. Watermelon is more than just a hot-weather treat... it may help fight one of the most common cancers and aid weight loss, according to research Devastating new details about Beartooth frontman's marriage as he comes out as'proudly' gay: Wife's heartbreak revealed by insiders and red flag that was overlooked Scientists discover a third eye hidden in the human body and the reason it's there MORE: Four species of aliens recovered from crashed UFOs according to CIA scientist... here's what they look like Scientists have found a third eye buried in the middle of the human head and say it still plays a key role after millions of years of evolution.
The AI Era Is Creating a Bug Hunting Arms Race
As attackers ramp up their AI exploit development, the search for software vulnerabilities is changing rapidly. A decade ago, programs to reward researchers for submitting software vulnerability findings were just starting to go mainstream. Vulnerability disclosure and "bug bounty" programs represented a paradigm shift years in the making--moving institutions from hostility and defensiveness about security research findings to acknowledgement that receiving input and releasing fixes was necessary. When Apple finally announced a bug bounty in 2016, the top reward was $200,000. It rose to $1 million in 2019 and $2 million last year .
Sakura Internet eyes more spending to meet AI data center demand
Countries including Japan see the ability to control chips, data centers and AI models as directly related to national resilience in a landscape dominated by U.S. and Chinese technology. Sakura Internet's chief said the company may need to hike its capital spending by nearly seven times its initial plan to keep up with artificial intelligence demand in Japan. The data center operator is eyeing an allocation of as much as ¥20 billion to ¥30 billion ($125 million to $190 million) this fiscal year, founder and CEO Kunihiro Tanaka said. That's above the ¥4.4 billion in the Osaka-based company's official capital expenditure plan announced last month. "AI server usage rates are 80% to 90%," Tanaka, 48, said in an interview.
Golf ball-sized octopus discovered near the Galápagos Islands
More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Breakthroughs, discoveries, and DIY tips sent six days a week. A tiny, bright blue octopus is small enough to fit inside the palm of your hand, but good luck trying to meet one. According to marine biologists, you'll likely have to settle with admiring it from afar for now unless you have access to a deep sea submersible--and a ticket to the Galápagos Islands . While conducting a deep sea expedition aboard the research vessel E/V, biologists spotted the diminutive invertebrate as they piloted a remotely operated vehicle (ROV) along the ocean floor near Darwin Island.