sig
Neuron Block Dynamics for XOR Classification with Zero-Margin
Braun, Guillaume, Imaizumi, Masaaki
The ability of neural networks to learn useful features through stochastic gradient descent (SGD) is a cornerstone of their success. Most theoretical analyses focus on regression or on classification tasks with a positive margin, where worst-case gradient bounds suffice. In contrast, we study zero-margin nonlinear classification by analyzing the Gaussian XOR problem, where inputs are Gaussian and the XOR decision boundary determines labels. In this setting, a non-negligible fraction of data lies arbitrarily close to the boundary, breaking standard margin-based arguments. Building on Glasgow's (2024) analysis, we extend the study of training dynamics from discrete to Gaussian inputs and develop a framework for the dynamics of neuron blocks. We show that neurons cluster into four directions and that block-level signals evolve coherently, a phenomenon essential in the Gaussian setting where individual neuron signals vary significantly. Leveraging this block perspective, we analyze generalization without relying on margin assumptions, adopting an average-case view that distinguishes regions of reliable prediction from regions of persistent error. Numerical experiments confirm the predicted two-phase block dynamics and demonstrate their robustness beyond the Gaussian setting.
Random Controlled Differential Equations
Piatti, Francesco, Cass, Thomas, Turner, William F.
We introduce a training-efficient framework for time-series learning that combines random features with controlled differential equations (CDEs). In this approach, large randomly parameterized CDEs act as continuous-time reservoirs, mapping input paths to rich representations. Only a linear readout layer is trained, resulting in fast, scalable models with strong inductive bias. Building on this foundation, we propose two variants: (i) Random Fourier CDEs (RF-CDEs): these lift the input signal using random Fourier features prior to the dynamics, providing a kernel-free approximation of RBF-enhanced sequence models; (ii) Random Rough DEs (R-RDEs): these operate directly on rough-path inputs via a log-ODE discretization, using log-signatures to capture higher-order temporal interactions while remaining stable and efficient. We prove that in the infinite-width limit, these model induces the RBF-lifted signature kernel and the rough signature kernel, respectively, offering a unified perspective on random-feature reservoirs, continuous-time deep architectures, and path-signature theory. We evaluate both models across a range of time-series benchmarks, demonstrating competitive or state-of-the-art performance. These methods provide a practical alternative to explicit signature computations, retaining their inductive bias while benefiting from the efficiency of random features.
Fundamental Novel Consistency Theory: $H$-Consistency Bounds
In machine learning, the loss functions optimized during training often differ from the target loss that defines task performance due to computational intractability or lack of differentiability. We present an in-depth study of the target loss estimation error relative to the surrogate loss estimation error. Our analysis leads to $H$-consistency bounds, which are guarantees accounting for the hypothesis set $H$. These bounds offer stronger guarantees than Bayes-consistency or $H$-calibration and are more informative than excess error bounds. We begin with binary classification, establishing tight distribution-dependent and -independent bounds. We provide explicit bounds for convex surrogates (including linear models and neural networks) and analyze the adversarial setting for surrogates like $ρ$-margin and sigmoid loss. Extending to multi-class classification, we present the first $H$-consistency bounds for max, sum, and constrained losses, covering both non-adversarial and adversarial scenarios. We demonstrate that in some cases, non-trivial $H$-consistency bounds are unattainable. We also investigate comp-sum losses (e.g., cross-entropy, MAE), deriving their first $H$-consistency bounds and introducing smooth adversarial variants that yield robust learning algorithms. We develop a comprehensive framework for deriving these bounds across various surrogates, introducing new characterizations for constrained and comp-sum losses. Finally, we examine the growth rates of $H$-consistency bounds, establishing a universal square-root growth rate for smooth surrogates in binary and multi-class tasks, and analyze minimizability gaps to guide surrogate selection.