Genre
Probing for Representation Manifolds in Superposition
This paper introduces the Manifold Probe, a supervised method for discovering representation manifolds in superposition. The method generalizes linear regression probes by learning the space of features of a concept that can be linearly predicted from the representations, and then learning the directions used to encode them. We demonstrate the probe on representations of time and space in Llama 2-7b, finding manifolds which linearly represent an interpretable set of features in each case. In the case of time, we show that by steering along the manifold, we can influence the model's completions about the years in which famous songs, movies and books were released, providing evidence that the Manifold Probe can discover manifolds which are causally involved in model behaviour.
Federated Martingale Posterior Samping
Zhang, Boning, Zecchin, Matteo, Guo, Mingzhao, Liu, Dongzhu, Simeone, Osvaldo
Federated Bayesian neural networks require fixing a prior on the model parameters together with a likelihood. Eliciting meaningful priors on the weight space of modern overparameterized models is notoriously difficult, and misspecification of either component can severely degrade accuracy and calibration. Motivated by the rapid progress of predictive models such as large language models, the martingale posterior, also known as predictive Bayes, replaces the prior--likelihood pair with a predictive distribution and recovers parameter uncertainty by repeatedly drawing predictive samples and refitting the model. A direct federated implementation, however, would require clients to share the local data sets. This letter proposes {federated martingale posterior} (FMP) sampling, a one-shot embarrassingly parallel protocol in which each client uploads a small set of trainable data embeddings and the server runs the predictive sampler centrally. Experiments on MNIST, CIFAR-10, and CIFAR-100 show that FMP closely matches the centralized counterpart and significantly improves calibration over consensus-style baselines.
Stable Causal Discovery via Directed Acyclic Graph Aggregation
Wu, Yunan, Wang, Yue, Li, Chunlin, Ye, Chenglong
Directed Acyclic Graphs (DAGs) are central to uncovering causal structure in complex systems, yet learning a single DAG from data is often challenging: model uncertainty, finite samples, and a combinatorially large search space frequently yield unstable estimates. We propose DAGgr, a model averaging framework that aggregates multiple candidate DAGs into a single stable representation. Candidate graphs are weighted by their out-of-sample predictive likelihood across repeated data splits, and a thresholding rule on the resulting edge-importance scores guarantees that the aggregated graph is itself acyclic. We establish a finite-sample risk bound, prove that the procedure preserves acyclicity, and show that edge selection is consistent under mild conditions on the weights. Simulations across random, hub, and chain structures, together with an analysis of the Sachs et al. (2005) protein-signaling network, show that DAGgr matches or exceeds the best individual candidate while consistently outperforming bootstrap-aggregation baselines across structural recovery metrics.
Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning
Auddy, Arnab, Peng, Xiangni, Paul, Subhadeep
Federated Learning is a leading framework for training ML and AI models collaboratively across numerous user devices or databases. We study the trade-offs among estimation accuracy, privacy constraints, and communication cost for differentially private (DP) federated M estimation. The two standard methods in the literature are FedAvg, which may suffer from high federation bias, and FedSGD, which can incur high communication cost. Aimed at improving accuracy at a reduced communication cost, we propose FedHybrid, which uses FedSGD starting with an improved initialization by the FedAvg estimator. We propose FedNewton, which averages local Newton iterations to reduce bias in FedAvg, achieving an estimation accuracy comparable to FedSGD with much fewer communication rounds when the number of clients grows sufficiently slowly. We establish finite sample upper bounds on the mean-squared error rates of the DP versions of these estimators as functions of the number of clients, local sample sizes, privacy budget, and number of iterations. We further derive a minimax lower bound on the MSE of any iterative private federated procedure that provides a benchmark to assess the optimality gap of these methods. We numerically evaluate our methods for training a logistic regression and a neural network on the computer vision datasets MNIST and CIFAR-10.
Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad
Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient normalization, have been introduced to ensure the convergence of first-order algorithms. However, adaptive gradient methods, a famous class of modern optimizers that includes popular $\mathtt{Adam}$ and $\mathtt{AdamW}$, often perform well even without any extra operations mentioned above. It is therefore natural to ask whether adaptive gradient methods can converge under heavy-tailed noise without any algorithmic changes. In this work, we take the first step toward answering this question by investigating a special case, $\mathtt{AdaGrad}$, the origin of adaptive gradient methods. We provide the first provable convergence rate for $\mathtt{AdaGrad}$ in non-convex optimization when the tail index $p$ satisfies $4/3
Satellites and AI used to track UK hedgehogs in bid to slow decline
Researchers at the University of Cambridge are using satellite data and AI in an effort to slow the decline in Britain's hedgehog population. Using an AI tool called Tessera, which analyses detailed images of the UK gathered from space, experts can precisely determine locations of hedgehog habitats - and where these are disappearing. The resulting maps capture landscapes in minute detail, including down to individual hedgerows, while AI can accurately predict hedgehog-friendly places obscured by cloud cover. Those behind the project hope it will help to shed light not just on where hedgehogs live across the UK, but barriers preventing them from finding food and mates. The researchers say Tessera's outputs can be used to track the impact of new housing developments and other environmental changes on landscapes that could affect hedgehogs over time.
NextEra, Dominion to create huge power biz as AI drives US energy demand
NextEra Energy is seeking to acquire Dominion Energy in an all-stock deal valued at about $67bn, creating a massive power company as the energy needs of artificial intelligence (AI) drive demand higher in the United States. It is one of the biggest proposed mergers so far this year and would create the world's largest regulated electric utility business by market capitalisation, the companies said on Monday. The region has a fast-growing population and the world's biggest data centre hub, which is in Virginia. The deal will enable a swifter build-out of power infrastructure to deliver electricity to data centres proposing to connect to NextEra and Dominion, which total about 130 gigawatts of electricity demand, the companies' executives said. One gigawatt can power about 750,000 homes. The merger builds on NextEra's efforts to tap into surging demand for supplying electricity to data centres developed by Big Tech, largely for training and rolling out AI technologies.
Tata-ASML deal: How significant is it for India's semiconductor push?
Tata-ASML deal: How significant is it for India's semiconductor push? India's Tata Electronics has signed a deal with the Dutch technology giant ASML (Advanced Semiconductor Materials Lithography) to build India's first front-end semiconductor fabrication plant as New Delhi pushes to develop a domestic semiconductor manufacturing base. Front-end manufacturing refers to the building of microscopic circuits onto a blank silicon wafer using specialised lithographic machines. ASML is a pioneer of lithographic technology used in the mass production of microchips across the world. Semiconductor chips power modern technology and are critical for everything from smartphones and cars to artificial intelligence systems and defence technology.
Leg evolution made most humans right-handed
'Rightie' preference isn't seen in any of our primate relatives. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Most of our primate relatives are a mix of right-and left-handed. Breakthroughs, discoveries, and DIY tips sent six days a week. It would make more sense if only a few related cultures exhibited it, but the trait is everywhere.