expansion
Counterfactual Evolution of Multimodal Datasets via Visual Programming
The rapid development of Multimodal Large Language Models (MLLMs) poses increasing demands on the diversity and complexity of multimodal datasets. Yet manual annotation pipelines can no longer keep pace. Existing augmentation methods often follow fixed rules and lack verifiable control over sample diversity and reasoning complexity. To address this, we introduce Scalable COunterfactual Program Evolution (SCOPE), a framework that uses symbolic Visual Programming to guide program evolution via counterfactual reasoning. SCOPE performs the three steps of counterfactual inference: (1) Abduction, by generating verifiable programs to model reasoning associations; (2) Action, by intervening on program structure along three axes--reasoning path, visual context, and cross-instance composition; and (3) Prediction, by categorizing evolved instances by difficulty, structure, and input multiplicity. Based on this process, we build SCOPE-Train and SCOPE-Test, evolving benchmarks with expert validation. To support training, we propose MAP, a curriculum learning strategy that aligns model capacity with sample difficulty. Experiments show that SCOPEimproves reasoning performance, exposes model blind spots, and enhances visual dialog capabilities.
Reproducing Kernel Banach Space Models for Neural Networks with Application to Rademacher Complexity Analysis
This paper explores the use of Hermite transform based reproducing kernel Banach space methods to construct exact or un-approximated models of feedforward neural networks of arbitrary width, depth and topology, including ResNet and Transformers networks, assuming only a feedforward topology, finite energy activations and finite (spectral-) norm weights and biases. Using this model, two straightforward but surprisingly tight bounds on Rademacher complexity are derived, precisely (1) a general bound that is width-independent and scales exponentially with depth; and (2) a width-and depth-independent bound for networks with appropriately constrained (below threshold) weights and biases.
Scalable Signature Kernel Computations via Local Neumann Series Expansions
The signature kernel [10] is a recent state-of-the-art tool for analyzing highdimensional sequential data, valued for its theoretical guarantees and strong empirical performance. In this paper, we present a novel method for efficiently computing the signature kernel of long, high-dimensional time series via adaptively truncated recursive local power series expansions. Building on the characterization of the signature kernel as the solution of a Goursat PDE [17], our approach employs tilewise Neumann-series expansions to derive rapidly converging power series approximations of the signature kernel that are locally defined on subdomains and propagated iteratively across the entire domain of the Goursat solution by exploiting the geometry of the time series. Algorithmically, this involves solving a system of interdependent Goursat PDEs via adaptively truncated local power series expansions and recursive propagation of boundary conditions along a directed graph in a topological ordering.
Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms
Solving systems of polynomial equations, particularly those with finitely many solutions, is a crucial challenge across many scientific fields. Traditional methods like Gröbner and Border bases are fundamental but suffer from high computational costs, which have motivated recent Deep Learning approaches to improve efficiency, albeit at the expense of output correctness.
Welcome to the Waymo World Cup
It might not feel all that different from older World Cups--for better or worse. Waymo, the Alphabet subsidiary offering robotaxi rides in 11 US metros right now, says it's ready for the FIFA World Cup . Match attendees can catch driverless rides to six of the 16 North American venues: stadiums in Atlanta, Houston, Los Angeles, Miami, and the San Francisco Bay Area. The sprawling football event, expected to attract some 6.5 million visitors to the continent over more than a month, could prove an exciting close-up for Waymo . The company says it's serving half-a-million paid rides a week--paltry stuff compared to the likes of ride-hail giants Uber and Lyft, but more impressive once you remember that the things don't have drivers.
Convolution Goes Higher-Order: A Biologically Inspired Mechanism Empowers Image Classification
We propose a novel enhancement to Convolutional Neural Networks (CNNs) by incorporating learnable higher-order convolutions inspired by nonlinear biological visual processing. Our model extends the classical convolution operator using a Volterra-like expansion to capture multiplicative interactions observed in biological vision. Through extensive evaluation on standard benchmarks and synthetic datasets, we demonstrate that our architecture consistently outperforms traditional CNN baselines, achieving optimal performance with 3rd/4th order expansions. Systematic perturbation analysis and Representational Similarity Analysis reveal that different orders of convolution process distinct aspects of visual information, aligning with the statistical properties of natural images. This biologically-inspired approach offers both improved performance and deeper insights into visual information processing.
Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models
Takeda, Ken, Oizumi, Masafumi, Karakida, Ryo
Generative models, including diffusion models, are increasingly used as foundation models and adapted through sequential fine-tuning, making continual learning an essential problem setting. However, continual learning in such generative models remains poorly understood: after a task change, what aspects of the learned distribution are most easily lost, and what replay samples should be prioritized? We address these questions through the modern Hopfield energy. Recent links between modern Hopfield networks (MHNs) and diffusion models allow analyses in MHNs to be transferred to diffusion models. We introduce intrinsic forgetting as an increase in Hopfield energy after the task change. In tractable settings in an MHN, we prove that high-energy, outlier-like samples undergo a larger energy increase than cluster-like samples, implying that samples located in sharp, isolated basins are more forgettable. We further analyze memory replay and show that replay is particularly effective for high-energy samples, enabling an energy-based selection of replay samples. We validate these predictions in experiments on MHNs and two diffusion models under continual-learning settings: Stable Diffusion and a pixel-space DDPM. In these diffusion models, Hopfield energy tracks reconstruction-based forgetting, and replay experiments reveal energy-dependent mitigation of forgetting that is consistent with the MHN analysis.
On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions
We study sample quantiles of distributions indexed by estimated parameters, with a on Value-at-Risk related to linear projections of financial returns that whose underlying probability law is heavy-tailed. In this setting, the projection direction and the empirical quantile threshold are estimated from the data, so the standard Bahadur representation under a fixed distribution does not separate the distinct sources of instability. A canonical starting point is Bahadur's representation, which expresses the sample quantile through the empirical distribution function plus a remainder term \cite{bahadur1966}. Empirical-process theory provides a usable scaffolding through the mechanics of half-spaces, symmetric differences, and Glivenko--Cantelli uniform convergence. They yield stability bounds, but absorb changes in projection direction and changes in quantile threshold into a single symmetric-difference measure. Interestingly, a global uniform-convergence requirement is imposed on what is intrinsically a local quantile-stability problem. This paper introduces a Q-Q orthogonality formulation for separating projection-direction and quantile-threshold effects. The object of interest is the difference between the empirical quantile computed using the estimated projection direction and the population quantile computed at the reference projection direction. We decompose this difference into three terms, $\hat q_α(\hat w)-q_α(w_0)=D_1+D_2+D_3$. Here, $D_1$ measures the population quantile movement induced by perturbing the projection direction, $D_2$ measures the empirical quantile fluctuation with the projection direction held fixed, and $D_3$ is the Bahadur-type remainder.
Don't pay 30 for Tomb Raider I–III Remastered. It's free right now
PCWorld highlights that Tomb Raider I-III Remastered Starring Lara Croft is currently available for free on Epic Games Store instead of its usual $30 price. The collection includes all three classic Tomb Raider games plus expansions, featuring updated graphics, improved controls, and new challenge modes while maintaining original gameplay. This remaster scored 75 on Metacritic and offers both classic and enhanced visuals, making it appealing for new players and nostalgic fans alike. Few game series come close to matching the cultural impact of --and now is the perfect opportunity to experience Lara Croft's early days for yourself, except with updated graphics and controls that have been brought up to modern standards.
Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data
Genans, Ferdinand, Scornet, Erwan
Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector $p$, with $O(\|p\|)$ as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic gradients to cancel the leading bias term. We prove that one Richardson step reduces the gradient bias from $O(\|p\|)$ to $O(\|p\|^2)$ under several missingness scenarios. Our proposed method is computationally efficient, model-agnostic and applies to any parametric loss whose stochastic gradient can be computed after imputation. Furthermore, when missing indicators are independent, the population gradient bias is a multilinear polynomial in $p$ and depends only on population gradient errors induced by declaring a single coordinate missing. In this case, our method generalizes to a multi-step Richardson procedure which recursively cancels higher-order terms. Empirically, Richardson debiasing improves optimization and estimation across several generalized linear models and combines positively with widely used imputation procedures such as MICE. These results suggest that, somewhat counter-intuitively, adding controlled missingness on top of existing missing data can make stochastic learning from incomplete data more accurate.