Technology
Marginal-Nonuniform PAC Learnability
We revisit the classical model of nonuniform PAC learning, introduced by Benedek and Itai [1994], where generalization guarantees may depend on the target concept (but not on the marginal distribution). In this work, we study a complementary variant, which we call *marginal-nonuniform learning*. In this setting, guarantees may depend on the marginal distribution over the domain, but must hold uniformly over all concepts. This captures the intuition that some data distributions are inherently easier to learn from than others, allowing for a flexible, distribution-sensitive view of learnability. Our main result is a complete characterization of the achievable learning rates in this model, revealing a trichotomy: exponential rates of the form $e^{-n}$ arise precisely when the hypothesis class is finite; linear rates of the form $d/n$ are achievable when a recently introduced combinatorial parameter, the VC-eluder dimension $d$, is finite; and arbitrarily slow rates may occur when $d = \infty$. Additionally, in the original (concept-)nonuniform model, we show that for all learnable classes linear rates are achievable. We conclude by situating marginal-nonuniform learning within the landscape of universal learning, and by discussing its relationship to other distribution-dependent learning paradigms.
AlphaBeta is not as good as you think: a simple class of synthetic games for a better analysis of deterministic game-solving algorithms
Deterministic game-solving algorithms are conventionally analyzed in the light of their average-case complexity against a distribution of random game-trees, where leaf values are independently sampled from a fixed distribution. This simplified model enables uncluttered mathematical analysis, revealing two key properties: root value distributions asymptotically collapse to a single fixed value for finite-valued trees, and all reasonable algorithms achieve global optimality. However, these findings are artifacts of the model's design: its long criticized independence assumption strips games of structural complexity, producing trivial instances where no algorithm faces meaningful challenges. To address this limitation, we introduce a class of synthetic games generated by a probabilistic model that incrementally constructs game-trees using a fixed level-wise conditional distribution. By enforcing ancestor dependencies, a critical structural feature of real-world games, our framework generates problems with adjustable difficulty while retaining some form of analytical tractability. For several algorithms, including AlphaBeta and Scout, we derive recursive formulas characterizing their average-case complexities under this model. These allow us to rigorously compare algorithms on deep game-trees, where Monte-Carlo simulations are no longer feasible. While asymptotically, all algorithms seem to converge to identical branching factor (a result analogous to that of independence-based models), deep finite trees reveal stark differences: AlphaBeta incurs a significantly larger constant multiplicative factor compared to algorithms like Scout, leading to a substantial practical slowdown. Our framework sheds new light on classical game-solving algorithms, offering rigorous evidence and analytical tools to advance the understanding of these methods under a richer, more challenging, and yet tractable model.
Shape-Informed Clustering of Multi-Dimensional Functional Data via Deep Functional Autoencoders
We introduce FAEclust, a novel functional autoencoder framework for cluster analysis of multi-dimensional functional data, data that are random realizations of vector-valued random functions. Our framework features a universal-approximator encoder that captures complex nonlinear interdependencies among component functions, and a universal-approximator decoder capable of accurately reconstructing both Euclidean and manifold-valued functional data. Stability and robustness are enhanced through innovative regularization strategies applied to functional weights and biases. Additionally, we incorporate a clustering loss into the network's training objective, promoting the learning of latent representations that are conducive to effective clustering. A key innovation is our shape-informed clustering objective, ensuring that the clustering results are resistant to phase variations in the functions. We establish the universal approximation property of our non-linear decoder and validate the effectiveness of our model through extensive experiments.
The Rich and the Simple: On the Implicit Bias of Adam and SGD
Adam is the de facto optimization algorithm for several deep learning applications, but an understanding of its implicit bias and how it differs from other algorithms, particularly standard first-order methods such as (stochastic) gradient descent (GD), remains limited. In practice, neural networks (NNs) trained with SGD are known to exhibit simplicity bias --- a tendency to find simple solutions. In contrast, we show that Adam is more resistant to such simplicity bias. First, we investigate the differences in the implicit biases of Adam and GD when training two-layer ReLU NNs on a binary classification task with Gaussian data. We find that GD exhibits a simplicity bias, resulting in a linear decision boundary with a suboptimal margin, whereas Adam leads to much richer and more diverse features, producing a nonlinear boundary that is closer to the Bayes' optimal predictor. This richer decision boundary also allows Adam to achieve higher test accuracy both in-distribution and under certain distribution shifts. We theoretically prove these results by analyzing the population gradients. Next, to corroborate our theoretical findings, we present extensive empirical results showing that this property of Adam leads to superior generalization across various datasets with spurious correlations where NNs trained with SGD are known to show simplicity bias and do not generalize well under certain distributional shifts.
On the Convergence of Stochastic Smoothed Multi-Level Compositional Gradient Descent Ascent
Multi-level compositional optimization is a fundamental framework in machine learning with broad applications. While recent advances have addressed compositional minimization problems, the stochastic multi-level compositional minimax problem introduces significant new challenges--most notably, the biased nature of stochastic gradients for both the primal and dual variables. In this work, we address this gap by proposing a novel stochastic multi-level compositional gradient descent-ascent algorithm, incorporating a smoothing technique under the nonconvex-PL condition. We establish a convergence rate to an $(\epsilon, \epsilon/\sqrt{\kappa})$-stationary point with improved dependence on the condition number at $O(\kappa^{3/2})$, where $\epsilon$ denotes the solution accuracy and $\kappa$ represents the condition number. Moreover, we design a novel stage-wise algorithm with variance reduction to address the biased gradient issue under the two-sided PL condition. This algorithm successfully enables a translation from and $(\epsilon, \epsilon/\sqrt{\kappa})$-stationary point to an $\epsilon$-stationary point.
Root Cause Analysis of Outliers with Missing Structural Knowledge
The goal of Root Cause Analysis (RCA) is to explain why an anomaly occurred by identifying where the fault originated. Several recent works model the anomalous event as resulting from a change in the causal mechanism at the root cause, i.e., as a soft intervention. RCA is then the task of identifying which causal mechanism changed. In real-world applications, one often has either few or only a single sample from the post-intervention distribution: a severe limitation for most methods, which assume one knows or can estimate the distribution. However, even those that do not are statistically ill-posed due to the need to probe regression models in regions of low probability density. In this paper, we propose simple, efficient methods to overcome both difficulties in the case where there is a single root cause and the causal graph is a polytree. When one knows the causal graph, we give guarantees for a traversal algorithm that requires only marginal anomaly scores and does not depend on specifying an arbitrary anomaly score cut-off. When one does not know the causal graph, we show that the heuristic of identifying root causes as the variables with the highest marginal anomaly scores is causally justified. To this end, we prove that anomalies with small scores are unlikely to cause those with larger scores in polytrees and give upper bounds for the likelihood of causal pathways with non-monotonic anomaly scores.
DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response
Large vision-language models (VLMs) have made great achievements in Earth vision. However, complex disaster scenes with diverse disaster types, geographic regions, and satellite sensors have posed new challenges for VLM applications. To fill this gap, we curate the first remote sensing vision-language dataset (DisasterM3) for global-scale disaster assessment and response. DisasterM3 includes 26,988 bi-temporal satellite images and 123k instruction pairs across 5 continents, with three characteristics: **1) Multi-hazard**: DisasterM3 involves 36 historical disaster events with significant impacts, which are categorized into 10 common natural and man-made disasters.
ARMesh: Autoregressive Mesh Generation via Next-Level-of-Detail Prediction
Directly generating 3D meshes, the default representation for 3D shapes in the graphics industry, using auto-regressive (AR) models has become popular these days, thanks to their sharpness, compactness in the generated results, and ability to represent various types of surfaces. However, AR mesh generative models typically construct meshes face by face in lexicographic order, which does not effectively capture the underlying geometry in a manner consistent with human perception. Inspired by 2D models that progressively refine images, such as the prevailing next-scale prediction AR models, we propose generating meshes auto-regressively in a progressive coarse-to-fine manner. Specifically, we view mesh simplification algorithms, which gradually merge mesh faces to build simpler meshes, as a natural fine-to-coarse process. Therefore, we generalize meshes to simplicial complexes and develop a transformer-based AR model to approximate the reverse process of simplification in the order of level of detail, constructing meshes initially from a single point and gradually adding geometric details through local remeshing, where the topology is not predefined and is alterable. Our experiments show that this novel progressive mesh generation approach not only provides intuitive control over generation quality and time consumption by early stopping the auto-regressive process but also enables applications such as mesh refinement and editing.
Learning Differential Pyramid Representation for Tone Mapping
Existing tone mapping methods operate on downsampled inputs and rely on handcrafted pyramids to recover high-frequency details. Existing tone mapping methods operate on downsampled inputs and rely on handcrafted pyramids to recover high-frequency details. These designs typically fail to preserve fine textures and structural fidelity in complex HDR scenes. Furthermore, most methods lack an effective mechanism to jointly model global tone consistency and local contrast enhancement, leading to globally flat or locally inconsistent outputs such as halo artifacts. We present the Differential Pyramid Representation Network (DPRNet), an end-to-end framework for high-fidelity tone mapping. At its core is a learnable differential pyramid that generalizes traditional Laplacian and Difference-of-Gaussian pyramids through content-aware differencing operations across scales. This allows DPRNet to adaptively capture high-frequency variations under diverse luminance and contrast conditions. To enforce perceptual consistency, DPRNet incorporates global tone perception and local tone tuning modules operating on downsampled inputs, enabling efficient yet expressive tone adaptation.