runtime
Nearly-Linear Time and Massively Parallel Algorithms for k-Anonymity
Previous algorithms with provable guarantees either (1) achieve the same O(k)approximation ratio but require at least O(n2k) runtime, or (2) provide a better O(logk) approximation ratio at the cost of an impractical O(n2k) worst-case runtime for general d and k. Our algorithm extends to the Massively Parallel Computation (MPC) model, where it gives an MPC algorithm requiring eO(log1+ฮต n) rounds and total space O(n1+ฮณ(d+k)). Empirically, we also demonstrate that our algorithmic ideas can be adapted to existing heuristic methods, leading to significant speed-ups while preserving comparable performance. On the hardness side, we study the related single-point k-anonymity problem, where the goal is to select k 1 additional records to make a given record indistinguishable. Assuming the dense vs random conjecture in complexity theory, we show that for n = kc, no algorithm can achieve a k1 O(1/c) approximation in poly(n) time, providing evidence for the inherent hardness of the k-anonymity problem.
Why Popular MOEAs Are Popular: Proven Advantages in Approximating the Pareto Front
Recent breakthroughs in the analysis of multi-objective evolutionary algorithms (MOEAs) are mathematical runtime analyses of those algorithms which are intensively used in practice. So far, most of these results show the same performance as previously known for simpler algorithms like the GSEMO. The few results indicating advantages of the popular MOEAs share the same shortages: They only consider the problem of computing the full Pareto front, sometimes of algorithms enriched with newly invented mechanisms, and this on newly designed benchmarks. In this work, we overcome these shortcomings by analyzing how existing popular MOEAs approximate the Pareto front of the established LARGEFRONT benchmark. We prove that several popular MOEAs, including NSGA-II (with current crowding distance), NSGA-III, SMS-EMOA, and SPEA2, only need an expected time of O(n2 logn) fitness evaluations to compute an additive ฯ-approximation of the Pareto front of the LARGEFRONT benchmark. This contrasts with the already proven exponential runtime (with high probability) of the GSEMO on the same task. Our result is the first mathematical runtime analysis showing and explaining the superiority of popular MOEAs over simple ones like the GSEMO for the central task of computing good approximations to the Pareto front.
SVRPBench: ARealistic Benchmark for Stochastic Vehicle Routing Problem
Robust routing under uncertainty is central to real-world logistics, yet most benchmarks assume static, idealized settings. We present SVRPBench, the first open benchmark to capture high-fidelity stochastic dynamics in vehicle routing at urban scale. Spanning more than 500 instances with up to 1000 customers, it simulates realistic delivery conditions: time-dependent congestion, log-normal delays, probabilistic accidents, and empirically grounded time windows for residential and commercial clients. Our pipeline generates diverse, constraint-rich scenarios, including multi-depot and multi-vehicle setups. Benchmarking reveals that state-of-the-art RL solvers like POMO and AM degrade by over 20% under distributional shift, while classical and metaheuristic methods remain robust. To enable reproducible research, we release the dataset (Hugging Face) and evaluation suite (GitHub). SVRPBenchchallenges the community to design solvers that generalize beyond synthetic assumptions and adapt to real-world uncertainty.
STree: Speculative Tree Decoding for Hybrid State-Space Models
Speculative decoding is a technique to leverage hardware concurrency in order to enable multiple steps of token generation in a single forward pass, thus improving the efficiency of large-scale autoregressive (AR) Transformer models. State-space models (SSMs) are already more efficient than ARTransformers, since their state summarizes all past data with no need to cache or re-process tokens in the sliding window context. However, their state can also comprise thousands of tokens; so, speculative decoding has recently been extended to SSMs. Existing approaches, however, do not leverage the tree-based verification methods, since current SSMs lack the means to compute a token tree efficiently. We propose the first scalable algorithm to perform tree-based speculative decoding in state-space models (SSMs) and hybrid architectures of SSMs and Transformer layers. We exploit the structure of accumulated state transition matrices to facilitate tree-based speculative decoding with minimal overhead relative to current SSM implementations. Along with the algorithm, we describe a hardware-aware implementation that improves naive application of ARTransformer tree-based speculative decoding methods to SSMs. Furthermore, we outperform vanilla speculative decoding with SSMs even with a baseline drafting model and tree structure on three different benchmarks, opening up opportunities for further speed up with SSM and hybrid model inference. Code can be find at: https://github.com/wyc1997/stree.
9a648e8e1014c2427156dcb5465cd488-Supplemental-Datasets_and_Benchmarks_Track.pdf
AResults883 In Table 4, we show a summary of the results of AlgoTuner for each of the four frontier models tests.884 Speedup percentage is calculated as the percentage of tasks for which AlgoTuner gets at least a 1.1 speedup. Speedup is calculated as the ratio between the reference solve function's time and the LM-generated solve function's time. The LM receives an initial message, consisting of general instructions on how to888 use the system (see C.1), Numba (Lam et al., 2015), Dask (Rocklin, 2015), and Cython (Behnel et al.,889 2011) (for a full list see Appendix E). Additionally, the LM is given the task's description, which890 includes input and output descriptions and examples, as well as the task's solveand is_solution891 functions.
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
The growing adoption of machine learning models for biological sequences has intensified the need for interpretable predictions, with Shapley values emerging as a theoretically grounded standard for model explanation. While effective for local explanations of individual input sequences, scaling Shapley-based interpretability to extract global biological insights requires evaluating thousands of sequences--incurring exponential computational cost per query. We introduce SHAP zero, a novel algorithm that amortizes the cost of Shapley value computation across large-scale biological datasets. After a one-time model sketching step, SHAP zero enables near-zero marginal cost for future queries by uncovering an underexplored connection between Shapley values, high-order feature interactions, and the sparse Fourier transform of the model. Applied to models of guide RNA efficacy, DNA repair outcomes, and protein fitness, SHAP zero explains predictions orders of magnitude faster than existing methods, recovering rich combinatorial interactions previously inaccessible at scale. This work opens the door to principled, efficient, and scalable interpretability for black-box sequence models in biology.
Why Playing Against Diverse and Challenging Opponents Speeds Up Coevolution: ATheoretical Analysis on Combinatorial Games
Competitive coevolutionary algorithms (CoEAs) have a natural application to problems that are adversarial or feature strategic interaction. However, there is currently limited theoretical insight into how to avoid pathological behaviour associated with CoEAs. In this paper we use impartial combinatorial games as a challenging domain for CoEAs and provide a corresponding runtime analysis. By analysing how individuals capitalise on the mistakes of their opponents, we prove that the Univariate Marginal Distribution Algorithm finds (with high probability) an optimal strategy for a game called Reciprocal LeadingOnes within O(n2 log3 n)game evaluations, a significant improvement over the best known bound of O(n5 log2 n). Critical to the analysis is the introduction of a novel stabilising operator, the impact of which we study both theoretically and empirically.
SORTeDRashomon Sets of Sparse Decision Trees: Anytime Enumeration
Sparse decision tree learning provides accurate and interpretable predictive models that are ideal for high-stakes applications by finding the single most accurate tree within a (soft) size limit. Rather than relying on a single "best" tree, Rashomon sets--trees with similar performance but varying structures--can be used to enhance variable importance analysis, enrich explanations, and enable users to choose simpler trees or those that satisfy stakeholder preferences (e.g., fairness) without hard-coding such criteria into the objective function. However, because finding the optimal tree is NP-hard, enumerating the Rashomon set is inherently challenging. Therefore, we introduce SORTD, a novel framework that improves scalability and enumerates trees in the Rashomon set in order of the objective value, thus offering anytime behavior. Our experiments show that SORTD reduces runtime by up to two orders of magnitude compared with the state of the art. Moreover, SORTD can compute Rashomon sets for any separable and totally ordered objective and supports post-evaluating the set using other separable (and partially ordered) objectives. Together, these advances make exploring Rashomon sets more practical in real-world applications.