Goto

Collaborating Authors

 Instructional Material


HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class

Neural Information Processing Systems

Large language models (LLMs) have shown remarkable progress in mathematical problem-solving, but evaluation has largely focused on problems that have exact analytical solutions or involve formal proofs, often overlooking approximation-based problems ubiquitous in applied science and engineering. To fill this gap, we build on prior work and present $\textbf{HARDMath2}$, a dataset of 211 original problems covering the core topics in an introductory graduate applied math class, including boundary-layer analysis, WKB methods, asymptotic solutions of nonlinear partial differential equations, and the asymptotics of oscillatory integrals. This dataset was designed and verified by the students and instructors of a core graduate applied mathematics course at Harvard. We build the dataset through a novel collaborative environment that challenges students to write and refine difficult problems consistent with the class syllabus, peer-validate solutions, test different models, and automatically check LLM-generated solutions against their own answers and numerical ground truths. Evaluation results show that leading frontier models still struggle with many of the problems in the dataset, highlighting a gap in the mathematical reasoning skills of current LLMs. Importantly, students identified strategies to create increasingly difficult problems by interacting with the models and exploiting common failure modes. This back-and-forth with the models not only resulted in a richer and more challenging benchmark but also led to qualitative improvements in the students' understanding of the course material, which is increasingly important as we enter an age where state-of-the-art language models can solve many challenging problems across a wide domain of fields.


Get officially certified in Claude AI for just 19.99

PCWorld

When you purchase through links in our articles, we may earn a small commission. Get officially certified in Claude AI for just $19.99 A Claude AI Professional E-Degree is on sale for $19.99 (reg. AI skills are no longer a nice-to-have. A verifiable credential in one of the most popular AI models on the market is a real resume differentiator, and right now, you can get an e-degree in Claude for just $19.99 (reg. While plenty of people have dabbled with Claude, there's a big difference between "I've used it a few times" and actually knowing how to make it work for you.


Multicalibration Boosting: Theory, Convergence, and Transferability

arXiv.org Machine Learning

Multicalibration extends classical calibration by requiring predictions to be unbiased over a rich collection of functions, encompassing both prediction slices and subpopulations. It has emerged as a powerful framework for fairness, robustness, and reliable prediction, yet the theoretical understanding of multicalibration boosting (MCBoost) remains fragmented and often relies on restrictive assumptions. In this work, we develop a unified and refined perspective on MCBoost that subsumes existing variants, including multiaccuracy, BatchGCP, and BatchMVP. We uncover several phenomena that provide new insights into its practical behavior: even highly accurate and flexible predictors can remain substantially miscalibrated; enforcing multicalibration introduces a calibration-risk trade-off; and early stopping plays a central role in controlling this trade-off. On the theoretical side, we establish a general framework for MCBoost under weaker and more realistic conditions. We show that the boosting iterates converge to a Bregman projection of the population-optimal predictor onto the cumulative span generated by the audit class, thereby explicitly characterizing the function space on which multicalibration is achieved. We further derive convergence rates under different smoothness assumptions, finite-sample guarantees, and principled stopping rules that ensure multicalibration at termination. Finally, we extend the theory of universal adaptability under covariate shift, providing more general transfer guarantees and clarifying when multicalibrated predictors generalize across domains. These results provide a more complete theoretical foundation and practical guidance for multicalibration boosting, positioning it as both a unifying framework and a reliable post-processing approach for modern predictive models.


LaGuardia Airport AI hologram answers traveler questions

FOX News

LaGuardia Airport's Terminal B now features an AI hologram concierge that answers traveler questions and provides step-by-step directions using real-time maps.


Concomitant DAG Learning: On the Roles of Noise Adaptivity, Sparsity, and Non-negativity

arXiv.org Machine Learning

Directed acyclic graphs (DAGs) constitute a central modeling tool to enable principled reasoning about cause-effect interactions in complex systems. However, since the causal structure underlying a group of variables is often unknown and interventions may be infeasible or ethically challenging to implement, there is a need to address the task of inferring DAGs from observational data. However, most classical structure identification approaches face two key obstacles: the combinatorial challenge of enforcing acyclicity, which severely limits scalability, and identifiability challenges arising from latent confounding or heterogeneous noise. This tutorial offers an overview of recent signal processing and optimization advances that address these issues by recasting DAG structure learning as a continuous, score-based estimation problem over adjacency matrices. We begin with a didactic introduction to structural equation models and the formulation of causal graph recovery, followed by a historical survey of score-based methods ranging from early combinatorial search schemes and greedy heuristics to modern continuous frameworks that leverage smooth characterizations of acyclicity. Building on this foundation, we describe concomitant DAG estimation methods that jointly infer sparse causal structure and exogenous noise levels, improving robustness under heteroscedasticity and distribution shifts by rendering the estimator noise adaptive. All in all, the tutorial introduces readers to challenges and opportunities for signal processing research at the crossroads of causal inference, high-dimensional statistics, and scalable graph learning, while outlining emerging directions including online, nonlinear, and neural causal discovery.


Get the newest Microsoft dev tools plus 15 coding courses -- only 50

PCWorld

When you purchase through links in our articles, we may earn a small commission. TL;DR: The Microsoft Visual Studio Professional 2026 bundle includes 15 coding courses and is on sale for $49.97 (regularly $1,999.99) This is the kind of tech purchase that tends to pay for itself pretty quickly. The Microsoft Visual Studio Professional 2026 bundle pairs one of the most widely used development environments in the industry with a full library of coding courses -- all for a single one-time payment. Visual Studio Professional 2026 has been a staple for professional developers for years, and the 2026 version pushes productivity even further.


The Zuckerbergs Are Hiring a Lifeguard but Calling It a 'Beach Water Person'

WIRED

The Zuckerbergs Are Hiring a Lifeguard but Calling It a'Beach Water Person' The job, which is associated with the Zuckerberg family office, is located in Kauai, Hawaii, where the Meta CEO owns a massive compound. Meta CEO Mark Zuckerberg and his wife Priscilla Chan are hiring a seasonal, on-call "Beach Water Person" based in Kauai, Hawaii, where the family owns a sprawling compound, according to a new job listing on Greenhouse associated with West 10, the Zuckerberg family office. This is an interesting choice for a job title, because according to the job description, the primary duties of this "Beach Water Person" include serving as a "Beach Lifeguard," and "Pool Lifeguard." The job listing names a few additional duties related to water activities, such as instructing "stand-up paddleboarding (SUP), canoe paddling, snorkeling, and other ocean-based activities." These, however, come after the water safety duties in the job description.


An ICE Firearms Trainer Was Involved in At Least 4 Deadly Shootings

WIRED

David Norman, a former Phoenix police officer who's described himself as "a fucking savage," now runs a company that provided training to Homeland Security's Special Response Teams. The owner of a company that trained paramilitary Immigration and Customs Enforcement agents testified that he was involved in at least four lethal shootings, according to a 2021 deposition related to a lawsuit reviewed by WIRED. David S. Norman, the founder and proprietor of law enforcement training firm TruKinetics LLC, served as a Phoenix Police officer from the late 1990s until his retirement in 2020. Prior to founding TruKinetics the same year, according to records reviewed by WIRED, Norman was involved in six shootings while on duty that left four people dead and two more wounded. In every instance, the Phoenix Police Department said Norman fired on an armed suspect and exchanged volleys of gunfire in at least two of the shootings. Based in Gilbert, Arizona, TruKinetics offers training on small-team tactics, hostage rescues, close-quarters combat, building searches, night-vision firearms proficiency, pistol and rifle courses, "vehicle interdiction," breaching with explosives, and sniper tactics, according to the company's website.


OpenAI is offering ChatGPT Plus to citizens of Malta for a year

Engadget

OpenAI has signed deals with fintech startups, tech giants and even Disney, but it's breaking new ground by announcing a world's first partnership with the country of Malta. In a post on its website, OpenAI said that it would provide ChatGPT Plus for one year to every Maltese resident or citizen. Malta is the first country to launch a partnership of this scale because we refuse to let our citizens stay behind in the digital age, Silvio Schembri, Malta's minister for Economy, Enterprise and Strategic Projects, said in a statement. We are putting our people at the very forefront of global change. For the approximately 574,250 residents living in Malta, they'll have to complete a course developed by the University of Malta before launching the ChatGPT Plus subscription, which costs $20 a month in the US.


Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift

arXiv.org Machine Learning

Offline-to-online learning aims to improve online decision-making by leveraging offline logged data. A central challenge in this setting is the distribution shift between offline and online environments. While some existing works attempt to leverage shifted offline data, they largely rely on UCB-type algorithms. Thompson sampling (TS) represents another canonical class of bandit algorithms, well known for its strong empirical performance and naturally suited to offline-to-online learning through its Bayesian formulation. However, unlike UCB indices, posterior samples in TS are not guaranteed to be optimistic with respect to the true arm means. This makes indices constructed from purely online and hybrid data difficult to compare and complicates their use. To address this issue, we propose sample-mean anchored TS (Anchor-TS), which introduces a novel median-based anchoring rule that defines the arm index as the median of an online posterior sample, a hybrid posterior sample, and the online sample mean. The median anchoring systematically corrects bias induced by distribution shift by mitigating over-estimation for suboptimal arms and under-estimation for optimal arms, while exploiting offline information to obtain more accurate estimates when the shift is small. We establish theoretical guarantees showing that the proposed algorithm safely leverages offline data to accelerate online learning, and quantifying how the degree of distribution shift and the size of offline data affect the resulting regret reduction. Extensive experiments demonstrate consistent improvements of our algorithm over baselines.