Goto

Collaborating Authors

 Country


Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

arXiv.org Machine Learning

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and construct RePlaid by aligning the architecture of Plaid with modern discrete DLMs. In this unified setting, we establish the first scaling law for continuous DLMs that rivals discrete DLMs: RePlaid exhibits a compute gap of only $20\times$ compared to autoregressive models, outperforms Duo while using fewer parameters, and outperforms MDLM in the over-trained regime. We benchmark RePlaid against recent continuous DLMs: on OpenWebText, RePlaid achieves a new state-of-the-art PPL bound of $22.1$ among continuous DLMs and superior generation quality. These results suggest that continuous diffusion, when trained via likelihood, is a highly competitive and scalable alternative to discrete DLMs. Moreover, we offer theoretical insights to understand the advantage of likelihood-based training. We show that optimizing the noise schedule to minimize the ELBO's variance naturally yields linear cross-entropy (information loss) over time. This evenly distributes denoising difficulty without any case-specific time reparameterization. In addition, we find that optimizing embeddings via likelihood creates structured geometries and drives the most significant likelihood gain.


Probing for Representation Manifolds in Superposition

arXiv.org Machine Learning

This paper introduces the Manifold Probe, a supervised method for discovering representation manifolds in superposition. The method generalizes linear regression probes by learning the space of features of a concept that can be linearly predicted from the representations, and then learning the directions used to encode them. We demonstrate the probe on representations of time and space in Llama 2-7b, finding manifolds which linearly represent an interpretable set of features in each case. In the case of time, we show that by steering along the manifold, we can influence the model's completions about the years in which famous songs, movies and books were released, providing evidence that the Manifold Probe can discover manifolds which are causally involved in model behaviour.


Federated Martingale Posterior Samping

arXiv.org Machine Learning

Federated Bayesian neural networks require fixing a prior on the model parameters together with a likelihood. Eliciting meaningful priors on the weight space of modern overparameterized models is notoriously difficult, and misspecification of either component can severely degrade accuracy and calibration. Motivated by the rapid progress of predictive models such as large language models, the martingale posterior, also known as predictive Bayes, replaces the prior--likelihood pair with a predictive distribution and recovers parameter uncertainty by repeatedly drawing predictive samples and refitting the model. A direct federated implementation, however, would require clients to share the local data sets. This letter proposes {federated martingale posterior} (FMP) sampling, a one-shot embarrassingly parallel protocol in which each client uploads a small set of trainable data embeddings and the server runs the predictive sampler centrally. Experiments on MNIST, CIFAR-10, and CIFAR-100 show that FMP closely matches the centralized counterpart and significantly improves calibration over consensus-style baselines.


Stable Causal Discovery via Directed Acyclic Graph Aggregation

arXiv.org Machine Learning

Directed Acyclic Graphs (DAGs) are central to uncovering causal structure in complex systems, yet learning a single DAG from data is often challenging: model uncertainty, finite samples, and a combinatorially large search space frequently yield unstable estimates. We propose DAGgr, a model averaging framework that aggregates multiple candidate DAGs into a single stable representation. Candidate graphs are weighted by their out-of-sample predictive likelihood across repeated data splits, and a thresholding rule on the resulting edge-importance scores guarantees that the aggregated graph is itself acyclic. We establish a finite-sample risk bound, prove that the procedure preserves acyclicity, and show that edge selection is consistent under mild conditions on the weights. Simulations across random, hub, and chain structures, together with an analysis of the Sachs et al. (2005) protein-signaling network, show that DAGgr matches or exceeds the best individual candidate while consistently outperforming bootstrap-aggregation baselines across structural recovery metrics.


Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning

arXiv.org Machine Learning

Federated Learning is a leading framework for training ML and AI models collaboratively across numerous user devices or databases. We study the trade-offs among estimation accuracy, privacy constraints, and communication cost for differentially private (DP) federated M estimation. The two standard methods in the literature are FedAvg, which may suffer from high federation bias, and FedSGD, which can incur high communication cost. Aimed at improving accuracy at a reduced communication cost, we propose FedHybrid, which uses FedSGD starting with an improved initialization by the FedAvg estimator. We propose FedNewton, which averages local Newton iterations to reduce bias in FedAvg, achieving an estimation accuracy comparable to FedSGD with much fewer communication rounds when the number of clients grows sufficiently slowly. We establish finite sample upper bounds on the mean-squared error rates of the DP versions of these estimators as functions of the number of clients, local sample sizes, privacy budget, and number of iterations. We further derive a minimax lower bound on the MSE of any iterative private federated procedure that provides a benchmark to assess the optimality gap of these methods. We numerically evaluate our methods for training a logistic regression and a neural network on the computer vision datasets MNIST and CIFAR-10.


Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad

arXiv.org Machine Learning

Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient normalization, have been introduced to ensure the convergence of first-order algorithms. However, adaptive gradient methods, a famous class of modern optimizers that includes popular $\mathtt{Adam}$ and $\mathtt{AdamW}$, often perform well even without any extra operations mentioned above. It is therefore natural to ask whether adaptive gradient methods can converge under heavy-tailed noise without any algorithmic changes. In this work, we take the first step toward answering this question by investigating a special case, $\mathtt{AdaGrad}$, the origin of adaptive gradient methods. We provide the first provable convergence rate for $\mathtt{AdaGrad}$ in non-convex optimization when the tail index $p$ satisfies $4/3


How Sam Altman's victory over Elon Musk clears way for OpenAI's trillion-dollar ambitions

The Guardian

Elon Musk, left, and Sam Altman. Elon Musk, left, and Sam Altman. How Sam Altman's victory over Elon Musk clears way for OpenAI's trillion-dollar ambitions OpenAI's plans now seem all but guaranteed, given that the world's richest man couldn't put a stop to them On Monday morning, a jury in Oakland, California, handed a resounding victory to Sam Altman and OpenAI in their long, bitter courtroom battle with Elon Musk. The federal jury found Altman, OpenAI and its president, Greg Brockman, not liable for Elon Musk's claims that they unjustly enriched themselves and broke a founding contract made with Musk when founding the startup. The unanimous verdict, delivered after less than two hours of deliberation, is a stark rebuke of Musk and his lawyer's claims that Altman "stole a charity" through his leadership of OpenAI.


Pope Leo to address rise of AI in first major text

The Japan Times

Pope Leo XIV holds the weekly general audience in St. Peter's Square at the Vatican on May 13. | REUTERS VATICAN CITY - Pope Leo will address the rise of artificial intelligence in his first in-depth text outlining his concerns, the Vatican said on Monday, adding that it would be unveiled on May 25 by the pontiff himself. The document, known as an encyclical, is likely to decry the use of AI in warfare and address how the technology is challenging workers' rights, according to sources. It will be titled "Magnifica Humanitas" (Magnificent Humanity) and was formally signed by the pope on Friday ahead of publication, a Vatican statement said. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right.


Satellites and AI used to track UK hedgehogs in bid to slow decline

BBC News

Researchers at the University of Cambridge are using satellite data and AI in an effort to slow the decline in Britain's hedgehog population. Using an AI tool called Tessera, which analyses detailed images of the UK gathered from space, experts can precisely determine locations of hedgehog habitats - and where these are disappearing. The resulting maps capture landscapes in minute detail, including down to individual hedgerows, while AI can accurately predict hedgehog-friendly places obscured by cloud cover. Those behind the project hope it will help to shed light not just on where hedgehogs live across the UK, but barriers preventing them from finding food and mates. The researchers say Tessera's outputs can be used to track the impact of new housing developments and other environmental changes on landscapes that could affect hedgehogs over time.


Third of university students in Great Britain think AI job losses will cause social unrest, poll finds

The Guardian

People attend a jobs fair in London. Only 24% of the members of public surveyed thought AI was a positive thing for humanity. People attend a jobs fair in London. Only 24% of the members of public surveyed thought AI was a positive thing for humanity. One in three university students think AI will wipe out jobs so rapidly it will trigger civil unrest, according to a survey by King's College London (KCL).