Goto

Collaborating Authors

 fraction


From Euler to AI: Unifying Formulas for Mathematical Constants

Neural Information Processing Systems

The constant ฯ€has fascinated scholars throughout the centuries, inspiring numerous formulas for its evaluation, such as infinite sums and continued fractions. Despite their individual significance, many of the underlying connections among formulas remain unknown, missing unifying theories that could unveil deeper understanding. The absence of a unifying theory reflects a broader challenge across math and science: knowledge is typically accumulated through isolated discoveries, while deeper connections often remain hidden. In this work, we present an automated framework for the unification of mathematical formulas. Our system combines large language models (LLMs) for systematic formula harvesting, an LLM-code feedback loop for validation, and a novel symbolic algorithm for clustering and eventual unification. We demonstrate this methodology on the hallmark case of ฯ€, an ideal testing ground for symbolic unification. Applying this approach to 455,050 arXiv papers, we validate 385 distinct formulas for ฯ€ and prove relations between 360 (94%) of them, of which 166 (43%) can be derived from a single mathematical object--linking canonical formulas by Euler, Gauss, Brouncker, and newer ones from algorithmic discoveries by the Ramanujan Machine. Our method generalizes to other constants, including e, ฮถ(3), and Catalan's constant, demonstrating the potential of AI-assisted mathematics to uncover hidden structures and unify knowledge across domains.


Evaluating LLM-Contaminated Crowdsourcing Data Without Ground Truth

Neural Information Processing Systems

The recent success of generative AI highlights the crucial role of high-quality human feedback in building trustworthy AI systems. However, the increasing use of large language models (LLMs) by crowdsourcing workers poses a significant challenge: datasets intended to reflect human input may be compromised by LLM-generated responses. Existing LLM detection approaches often rely on high-dimensional training data such as text, making them unsuitable for structured annotation tasks like multiple-choice labeling. In this work, we investigate the potential of peer prediction--a mechanism that evaluates the information within workers' responses--to mitigate LLM-assisted cheating in crowdsourcing with a focus on annotation tasks.


RGNMR: AGauss-Newton method for robust matrix completion with theoretical guarantees

Neural Information Processing Systems

Recovering a low rank matrix from a subset of its entries, some of which may be corrupted, is known as the robust matrix completion (RMC) problem. Existing RMC methods have several limitations: they require a relatively large number of observed entries; they may fail under overparametrization, when their assumed rank is higher than the correct one; and many of them fail to recover even mildly ill-conditioned matrices. In this paper we propose a novel RMC method, denoted RGNMR, which overcomes these limitations. RGNMRis a simple factorization-based iterative algorithm, which combines a Gauss-Newton linearization with removal of entries suspected to be outliers. On the theoretical front, we prove that under suitable assumptions, RGNMR is guaranteed exact recovery of the underlying low rank matrix. Our theoretical results improve upon the best currently known for factorization-based methods. On the empirical front, we show via several simulations the advantages of RGNMR over existing RMC methods, and in particular its ability to handle a small number of observed entries, overparameterization of the rank and ill-conditioned matrices. In addition, we propose a novel scheme for estimating the number of corrupted entries. This scheme may be used by other RMC methods that require as input the number of corrupted entries.


Matchings Under Biased and Correlated Evaluations

Neural Information Processing Systems

We study a two-institution stable matching model in which candidates from two distinct groups are evaluated using partially correlated signals that are groupbiased. This extends prior work (which assumes institutions evaluate candidates in an identical manner) to a more realistic setting in which institutions rely on overlapping, but independently processed, criteria. These evaluations could consist of a variety of informative tools such as standardized tests, shared recommendation systems, or AI-based assessments with local noise. Two key parameters govern evaluations: the bias parameter ฮฒ (0,1], which models systematic disadvantage faced by one group, and the correlation parameter ฮณ [0,1], which captures the alignment between institutional rankings. We study the representation ratio R(ฮฒ,ฮณ), i.e., the ratio of disadvantaged to advantaged candidates selected by the matching process in this setting.


On the Existence and Complexity of Core-Stable Data Exchanges

Neural Information Processing Systems

The rapid growth of data-driven technologies and the emergence of various datasharing paradigms have underscored the need for efficient and stable data exchange protocols. In any such exchange, agents must carefully balance the benefit of acquiring valuable data against the cost of sharing their own. Ensuring stability in these exchanges is essential to prevent agents--or groups of agents--from departing and conducting local (and potentially more favorable) exchanges among themselves. To address this, we study a model where n agents participate in a data exchange. Each agent has an associated payoff for the data acquired from other agents and a cost incurred during sharing its own data.


Understanding the Gain from Data Filtering in Multimodal Contrastive Learning

Neural Information Processing Systems

The success of modern multimodal representation learning relies on internet-scale datasets. Due to the low quality of a large fraction of raw web data, data curation has become a critical step in the training pipeline. Filtering using a trained model (i.e., teacher-based filtering) has emerged as a successful solution, leveraging a pre-trained model to compute quality scores. To explain the empirical success of teacher-based filtering, we characterize the performance of filtered contrastive learning under the standard bimodal data generation model. Denoting ฮท (0,1] as the fraction of data with correctly matched modalities among npaired samples, we utilize a linear contrastive learning setup to show a provable benefit of data filtering: (i) the error without filtering is upper and lower bounded by 1/ฮท n, and (ii)the error with teacher-based filtering is upper bounded by 1/ ฮทn in the large ฮท regime, and by 1/ n in the small ฮทregime.


Pioneering UK Nerve Lab harnesses AI to map effect of children's screen time

The Guardian

Tim Smith: 'Today's short-form, fast-paced, highly captivating content may affect children's attention, comprehension and emotional response'. Tim Smith: 'Today's short-form, fast-paced, highly captivating content may affect children's attention, comprehension and emotional response'. Pioneering UK Nerve Lab harnesses AI to map effect of children's screen time P arents are constantly being told to limit their children's screen time. A relatively slow-paced programme such as Bluey offers a very different viewing experience to a fast-moving action series such as PAW Patrol, yet both are broadly considered suitable for young children. This challenge is growing as the type of content children are exposed to evolves.


Understanding the Gain from Data Filtering in Multimodal Contrastive Learning

Neural Information Processing Systems

The success of modern multimodal representation learning relies on internet-scale datasets. Due to the low quality of a large fraction of raw web data, data curation has become a critical step in the training pipeline. Filtering using a trained model (i.e., teacher-based filtering) has emerged as a successful solution, leveraging a pre-trained model to compute quality scores. To explain the empirical success of teacher-based filtering, we characterize the performance of filtered contrastive learning under the standard bimodal data generation model. Denoting $\eta\in(0,1]$ as the fraction of data with correctly matched modalities among $n$ paired samples, we utilize a linear contrastive learning setup to show a provable benefit of data filtering: $(i)$ the error without filtering is upper and lower bounded by $\frac{1}{\eta \sqrt{n}}$, and $(ii)$ the error with teacher-based filtering is upper bounded by $\frac{1}{\sqrt{\eta n}}$ in the large $\eta$ regime, and by $\frac{1}{\sqrt{n}}$ in the small $\eta$ regime.


Could aliens ever visit Earth? An aerospace scientist unpacks the challenges of interstellar spaceflight.

Popular Science

Science Space Could aliens ever visit Earth? The universe is vast and teeming with stars - but if intelligent life exists, it may not be able to visit Earth. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. On May 22, 2026, the Pentagon released a second batch of previously classified photos and videos showing what appear to be unexplained flying objects. These file dumps were the culmination of a process that was set in motion back in July 2023, when a group of government whistleblowers testified before Congress that the U.S. government was secretly in possession of extraterrestrial spacecraft and suspected alien body parts.


Sampling Data with Chains of Forward-Backward Diffusion Steps

arXiv.org Machine Learning

Sampling from learned high-dimensional distributions is a foundational computational problem. We introduce U-turn chains: Markov chains obtained by iterating short forward-backward steps of a diffusion model, in which each step proposes a move that remains on the learned data manifold and, paired with a Metropolis-Hastings correction, samples from energy-modified targets. For synthetic languages, we show that minimal U-turn dynamics undergoes an ergodicity-breaking phase transition driven by fragmentation of the data manifold; ergodicity is restored at larger U-turn magnitude. In the non-ergodic regime, low-level features relax faster than high-level ones, an ordering that inverts only at sufficiently large U-turn magnitude. We test these predictions on natural language and natural images. In both modalities, minimal U-turns relax slowly, especially for high-level features approximated by deep representations in CNNs or LLMs. The layer-ordering inversion appears only at large noise when mixing is efficient -- signatures consistent with strongly constrained, weakly mixing local dynamics. We discuss the implications of these results for sampling with diffusion models.