South America
Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers
Huang, Jingkai, Ma, Will, Zhou, Zhengyuan
A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficient "L-aggregated" stopping policy that tracks only the L-1 most frequent answer counts. Theoretically, we prove that L=3 is all you need: this coarse approximation is sufficient to achieve asymptotic optimality, and strictly dominates prior-free baselines, while having a fast posterior computation. Empirically, this identifies the most consistent (i.e., mode) LLM answer using fewer samples, and can achieve similar answer accuracy while cutting the number of LLM calls (i.e., saving on LLM inference costs) by up to 50%.
Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks
Wang, Puyu, Zhou, Junyu, Liznerski, Philipp, Kloft, Marius
Kolmogorov--Arnold Networks (KANs) have recently emerged as a structured alternative to standard MLPs, yet a principled theory for their training dynamics, generalization, and privacy properties remains limited. In this paper, we analyze gradient descent (GD) for training two-layer KANs and derive general bounds that characterize their training dynamics, generalization, and utility under differential privacy (DP). As a concrete instantiation, we specialize our analysis to logistic loss under an NTK-separable assumption, where we show that polylogarithmic network width suffices for GD to achieve an optimization rate of order $1/T$ and a generalization rate of order $1/n$, with $T$ denoting the number of GD iterations and $n$ the sample size. In the private setting, we characterize the noise required for $(ฮต,ฮด)$-DP and obtain a utility bound of order $\sqrt{d}/(nฮต)$ (with $d$ the input dimension), matching the classical lower bound for general convex Lipschitz problems. Our results imply that polylogarithmic width is not only sufficient but also necessary under differential privacy, revealing a qualitative gap between non-private (sufficiency only) and private (necessity also emerges) training regimes. Experiments further illustrate how these theoretical insights can guide practical choices, including network width selection and early stopping.
Marine biologists discover 28 new deep sea species--and an old VHS tape
ROV pilots filmed this glass squid while exploring the Colorado-Rawson submarine canyon off the coast of Argentina. Breakthroughs, discoveries, and DIY tips sent six days a week. The marine biologists of the Schmidt Ocean Institute are a busy bunch. Over the last few years, scientists aboard the research vessel have spotted rare Antarctic squid, discovered multiple octopus near Costa Rica, and even cataloged over 100 potential new species off the coast of Chile. To kick off 2026, the Institute released a trove of new images and videos highlighting some of their latest observations from the south Atlantic Ocean.
Stationarity and Spectral Characterization of Random Signals on Simplicial Complexes
Navarro, Madeline, Buciulea, Andrei, Segarra, Santiago, Marques, Antonio
It is increasingly common for data to possess intricate structure, necessitating new models and analytical tools. Graphs, a prominent type of structure, can encode the relationships between any two entities (nodes). However, graphs neither allow connections that are not dyadic nor permit relationships between sets of nodes. We thus turn to simplicial complexes for connecting more than two nodes as well as modeling relationships between simplices, such as edges and triangles. Our data then consist of signals lying on topological spaces, represented by simplicial complexes. Much recent work explores these topological signals, albeit primarily through deterministic formulations. We propose a probabilistic framework for random signals defined on simplicial complexes. Specifically, we generalize the classical notion of stationarity. By spectral dualities of Hodge and Dirac theory, we define stationary topological signals as the outputs of topological filters given white noise. This definition naturally extends desirable properties of stationarity that hold for both time-series and graph signals. Crucially, we properly define topological power spectral density (PSD) through a clear spectral characterization. We then discuss the advantages of topological stationarity due to spectral properties via the PSD. In addition, we empirically demonstrate the practicality of these benefits through multiple synthetic and real-world simulations.
Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCA
Aguiรฉ, Pierre, Even, Mathieu, Massouliรฉ, Laurent
We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for instance in decentralized PCA. While previous works have established that acceleration can improve convergence rates compared to the standard Noisy Power Method, these guarantees require overly restrictive upper bounds on the magnitude of the perturbations, limiting their practical applicability. We provide an improved analysis of this algorithm, which preserves the accelerated convergence rate under much milder conditions on the perturbations. We show that our new analysis is worst-case optimal, in the sense that the convergence rate cannot be improved, and that the noise conditions we derive cannot be relaxed without sacrificing convergence guarantees. We demonstrate the practical relevance of our results by deriving an accelerated algorithm for decentralized PCA, which has similar communication costs to non-accelerated methods. To our knowledge, this is the first decentralized algorithm for PCA with provably accelerated convergence.
'She Has a Presence': The 'Melania' Superfans Who Turned Up for Opening Weekend
'She Has a Presence': The Superfans Who Turned Up for Opening Weekend WIRED attended two documentary screening parties--one on each coast--for the First Lady's film. For decades now, people have been wondering: Who is Melania Trump? The First Lady opens her 2024 memoir with a story about leaving her family in Slovenia to immigrate to America as a 26-year-old model. Ten years later, she became an American citizen. "It was not an easy process," she writes. "And my personal experience dealing with the trials of the immigration process opened my eyes to the difficulties faced by all who wish to become US citizens." OK, but what does that mean, exactly? Her husband, in both his terms as president, put harshly enforcing immigration policy at the center of his domestic agenda. This is all to say that I was authentically excited to see, the documentary that Amazon paid $40 million to acquire and $35 million to market. The director, Brett Ratner, previously accused of sexual misconduct by six different women, is currently in the news thanks to his appearance in a photo included in the most recent dump of Epstein files. What is Melania like behind closed doors?
Multivariate Time Series Data Imputation via Distributionally Robust Regularization
Liao, Che-Yi, Dong, Zheng, Garcia, Gian-Gabriel, Paynabar, Kamran
Multivariate time series (MTS) imputation is often compromised by mismatch between observed and true data distributions -- a bias exacerbated by non-stationarity and systematic missingness. Standard methods that minimize reconstruction error or encourage distributional alignment risk overfitting these biased observations. We propose the Distributionally Robust Regularized Imputer Objective (DRIO), which jointly minimizes reconstruction error and the divergence between the imputer and a worst-case distribution within a Wasserstein ambiguity set. We derive a tractable dual formulation that reduces infinite-dimensional optimization over measures to adversarial search over sample trajectories, and propose an adversarial learning algorithm compatible with flexible deep learning backbones. Comprehensive experiments on diverse real-world datasets show DRIO consistently improves imputation under both missing-completely-at-random and missing-not-at-random settings, reaching Pareto-optimal trade-offs between reconstruction accuracy and distributional alignment.
Safety-Efficacy Trade Off: Robustness against Data-Poisoning
Backdoor and data poisoning attacks can achieve high attack success while evading existing spectral and optimisation based defences. We show that this behaviour is not incidental, but arises from a fundamental geometric mechanism in input space. Using kernel ridge regression as an exact model of wide neural networks, we prove that clustered dirty label poisons induce a rank one spike in the input Hessian whose magnitude scales quadratically with attack efficacy. Crucially, for nonlinear kernels we identify a near clone regime in which poison efficacy remains order one while the induced input curvature vanishes, making the attack provably spectrally undetectable. We further show that input gradient regularisation contracts poison aligned Fisher and Hessian eigenmodes under gradient flow, yielding an explicit and unavoidable safety efficacy trade off by reducing data fitting capacity. For exponential kernels, this defence admits a precise interpretation as an anisotropic high pass filter that increases the effective length scale and suppresses near clone poisons. Extensive experiments on linear models and deep convolutional networks across MNIST and CIFAR 10 and CIFAR 100 validate the theory, demonstrating consistent lags between attack success and spectral visibility, and showing that regularisation and data augmentation jointly suppress poisoning. Our results establish when backdoors are inherently invisible, and provide the first end to end characterisation of poisoning, detectability, and defence through input space curvature.
Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey
In the last decade, Reinforcement Learning (RL) has achieved remarkable success in the control and decision-making of complex dynamical systems. However, most RL algorithms rely on the Markov Decision Process assumption, which is violated in practical cyber-physical systems affected by sensing delays, actuation latencies, and communication constraints. Such time delays introduce memory effects that can significantly degrade performance and compromise stability, particularly in networked and multi-agent environments. This paper presents a comprehensive survey of RL methods designed to address time delays in control systems. We first formalize the main classes of delays and analyze their impact on the Markov property. We then systematically categorize existing approaches into five major families: state augmentation and history-based representations, recurrent policies with learned memory, predictor-based and model-aware methods, robust and domain-randomized training strategies, and safe RL frameworks with explicit constraint handling. For each family, we discuss underlying principles, practical advantages, and inherent limitations. A comparative analysis highlights key trade-offs among these approaches and provides practical guidelines for selecting suitable methods under different delay characteristics and safety requirements. Finally, we identify open challenges and promising research directions, including stability certification, large-delay learning, multi-agent communication co-design, and standardized benchmarking. This survey aims to serve as a unified reference for researchers and practitioners developing reliable RL-based controllers in delay-affected cyber-physical systems.
Universal Redundancies in Time Series Foundation Models
Bao, Anthony, Vattikuti, Venkata Hasith, Lai, Jeffrey, Gilpin, William
Time Series Foundation Models (TSFMs) leverage extensive pretraining to accurately predict unseen time series during inference, without the need for task-specific fine-tuning. Through large-scale evaluations on standard benchmarks, we find that leading transformer-based TSFMs exhibit redundant components in their intermediate layers. We introduce a set of tools for mechanistic interpretability of TSFMs, including ablations of specific components and direct logit attribution on the residual stream. Our findings are consistent across several leading TSFMs with diverse architectures, and across a diverse set of real-world and synthetic time-series datasets. We discover that all models in our study are robust to ablations of entire layers. Furthermore, we develop a theoretical framework framing transformers as kernel regressors, motivating a purely intrinsic strategy for ablating heads based on the stable rank of the per-head projection matrices. Using this approach, we uncover the specific heads responsible for degenerate phenomena widely observed in TSFMs, such as parroting of motifs from the context and seasonality bias. Our study sheds light on the universal properties of this emerging class of architectures for continuous-time sequence modeling.