section 5
e433e40575f677fb3f7eb7b6b2fb3dd2-Paper-Conference.pdf
We analyze task orderings in continual learning for linear regression, assuming joint realizability of training data. We focus on orderings that greedily maximize dissimilarity between consecutive tasks, a concept briefly explored in prior work but still surrounded by open questions. Using tools from the Kaczmarz method literature, we formalize such orderings and develop geometric and algebraic intuitions around them. Empirically, we demonstrate that greedy orderings converge faster than random ones in terms of the average loss across tasks, both for linear regression with random data and for linear probing on CIFAR-100classification tasks. Analytically, in a high-rank regression setting, we prove a loss bound for greedy orderings analogous to that of random ones. However, under general rank, we establish a repetition-dependent separation. Specifically, while prior work showed that for random orderings, with or without replacement, the average loss after k iterations is bounded by O(1/ k)--we prove that single-pass greedy orderings may fail catastrophically, whereas those allowing repetition converge at rate O(1/ 3 k). Overall, we reveal nuances within and between greedy and random orderings.
PROSPERO: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods
Designing protein sequences of both high fitness and novelty is a challenging task in data-efficient protein engineering. Exploration beyond wild-type neighborhoods often leads to biologically implausible sequences or relies on surrogate models that lose fidelity in novel regions. Here, we propose PROSPERO, an active learning framework in which a frozen pre-trained generative model is guided by a surrogate updated from oracle feedback. By integrating fitness-relevant residue selection with biologically-constrained Sequential Monte Carlo sampling, our approach enables exploration beyond wild-type neighborhoods while preserving biological plausibility. We show that our framework remains effective even when the surrogate is misspecified. PROSPERO consistently outperforms or matches existing methods across diverse protein engineering tasks, retrieving sequences of both high fitness and novelty.
FaCT Faithful Concept Traces for Explaining Neural Network Decisions
Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge. Many post-hoc concept-based approaches have been introduced to understand their workings, yet they are not always faithful to the model. Further, they make restrictive assumptions on the concepts a model learns, such as classspecificity, small spatial extent, or alignment to human expectations. In this work, we put emphasis on the faithfulness of such concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations. Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced. We also leverage foundation models to propose a new concept-consistency metric, C2-score, that can be used to evaluate concept-based methods. Compared to prior work, we show that our concepts are quantitatively more consistent and that users find them to be more interpretable, while retaining competitive ImageNet performance. 1
Sheetpedia: A300K-Spreadsheet Corpus for Spreadsheet Intelligence and LLMFine-Tuning
Spreadsheets are widely used for data analysis and reporting, yet their complex structure and formula logic pose significant challenges for AI systems. We introduce Sheetpedia, a large-scale corpus of over 290,000 diverse spreadsheets (from 324,000+ workbooks) compiled from enterprise email archives and online forums. We detail a rigorous collection and preprocessing pipeline (integrating the Enron email spreadsheet archive and the Fuse web corpus, plus a new crawl of Excel forums) to standardize formats, filter languages, and remove duplicates. Sheetpedia provides extensive coverage of real formulas and annotations - addressing a gap left by prior table datasets (e.g.
Accelerating Model-Free Optimization via Averaging of Cost Samples
Model-free optimization methods typically rely on cost samples gathered by perturbing the current solution estimate along a finite and fixed set of directions. However, at each iteration, only the current cost samples are used, while potentially informative, previously collected samples are discarded. In this work, we challenge this conventional approach by introducing a simple yet effective memory mechanism that maintains an auxiliary vector of iteratively updated cost samples. By leveraging this stored information, our method estimates descent directions through an averaging of all perturbing directions weighted by the auxiliary vector components.
HYPRL: Reinforcement Learning of Control Policies for Hyperproperties
Reward shaping in multi-agent reinforcement learning (MARL) for complex tasks remains a significant challenge. Existing approaches often fail to find optimal solutions or cannot efficiently handle such tasks. We propose HYPRL, a specificationguided reinforcement learning framework that learns control policies w.r.t.
Uncertainty-Guided Exploration for Efficient AlphaZero Training
AlphaZero has achieved remarkable success in complex decision-making problems through self-play and neural network training. However, its self-play process remains inefficient due to limited exploration of high-uncertainty positions, the overlooked runner-up decisions in Monte Carlo Tree Search (MCTS), and high variance in value labels. To address these challenges, we propose and evaluate uncertainty-guided exploration by branching from high-uncertainty positions using our proposed Label Change Rate (LCR) metric, which is further refined by a Bayesian inference framework. Our proposed approach leverages runner-up MCTS decisions to create multiple variations, and ensembles value labels across these variations to reduce variance. We investigate three key design parameters for our branching strategy: where to branch, how many variations to branch, and which move to play in the new branch. Our empirical findings indicate that branching with 10 variations per game provides the best performance-exploration balance. Overall, our end-to-end results show an improved sample efficiency over the baseline by 58.5% on 9x9 Go in the early stage of training and by 47.3% on 19x19 Go in the late stage of training.
Flexible MOFGeneration with Torsion-Aware Flow Matching
Designing metal-organic frameworks (MOFs) with novel chemistries is a longstanding challenge due to their large combinatorial space and complex 3D arrangements of the building blocks. While recent deep generative models have enabled scalable MOF generation, they assume (1) a fixed set of building blocks and (2) known local 3D coordinates of building blocks. However, this limits their ability to (1) design novel MOFs and (2) generate the structure using novel building blocks. We propose a two-stage MOF generation framework that overcomes these limitations by modeling both chemical and geometric degrees of freedom. First, we train an SMILES-based autoregressive model to generate metal and organic building blocks, paired with a cheminformatics toolkit for 3D structure initialization. Second, we introduce a flow matching model that predicts translations, rotations, and torsional angles to assemble the blocks into valid 3D frameworks. Our experiments demonstrate improved reconstruction accuracy, the generation of valid, novel, and unique MOFs, and the ability to create novel building blocks.
Kernel-based potential mean-field games with unbiased random Fourier $U$-statistics
We study the subclass of potential mean-field games in which the running interaction cost and the terminal target cost are both expressed through reproducing-kernel maximum mean discrepancy (MMD) penalties, and develop a computational framework that exploits this kernel structure. Both costs are estimated from finite-sample empirical distributions using a random Fourier U-statistic representation that is unbiased and has linear cost in the batch size. The drift of the controlled diffusion is parametrized by a neural network and trained via stochastic gradient descent. For this subclass we prove a sample-level almost-sure convergence theorem and an explicit almost-sure rate of convergence, under coupled rate conditions on the penalty parameter, the random-feature count, the sample size, and the optimization tolerance. The framework includes the kernel-MMD-penalty Schrödinger bridge problem as the special case of a vanishing interaction cost. Numerical experiments illustrate the method on the Schrödinger bridge problem in dimensions up to one hundred, and on an electric vehicle charging coordination problem with per-vehicle physical heterogeneity, where an aggregate-demand congestion cost represents price-feedback competition at the population level and the terminal MMD penalty shapes the state-of-charge distribution at the deadline.
Beyond Differences: Doubly Robust Meta-Learners for Ratio-Based Treatment Effects
Fuchs, Michael, Kreiss, Dominik
When treatment effects are naturally expressed as ratios -- as in medicine, pricing, and marketing -- the ratio-based CATE $τ(x) = E[Y|W=1,X=x] / E[Y|W=0,X=x]$ is the appropriate estimand. Yet existing estimators either impose a log-linear parametric structure or apply generic regression without robustness guarantees for this functional. We introduce the Q-Learner, which decomposes $τ(x)$ into a product of two odds ratios, reducing ratio-CATE estimation for binary outcomes to two propensity classification tasks. We further derive doubly robust augmentations for both S/T- and Q-style ratio learners and characterize their distinct robustness properties. In benchmarks on seven RCT datasets, the Q-Learner is the most consistently competitive method in low-conversion regimes, where its propensity-only construction sidesteps the imbalanced regression that hurts outcome-based estimators. On four observational datasets, where propensity must be estimated and confounding cannot be ruled out, the DR learners introduced here decisively come out on top, making them practitioners' natural default for confounded observational data.